Behavioral cloning

Convolutional Neural Network Architecture

I designed a CNN based on the NVidia example, which has been able to successfully steer a real-world self-driving car without human intervention. The model consists of an input in the form of a 66x200 sized image which is fed into a CNN of 24 filters followed by a CNN of 36 filters followed by a CNN of 48 filters followed by 2 CNNs of 64 filters . I use strided convolutions in the first three convolutional layers with a 2×2 stride and a 5×5 kernel and a non-strided convolution with a 3×3 kernel size in the last two convolutional layers.I used exponential leaky units (ELU) for activation functions, because ELUs have smoother derivatives at zero, and hence are expected to be slighly better for predicted continuous values.I am using ‘valid’ padding to prevent introduction of unwanted edges in our resized data that may influence the network’s prediction capabilities.

I have followed the five convolutional layers with four fully connected layers leading to an output control value that determines the steering angle of the car as an output. The fully connected layers are designed to function as a controller for steering. However as noted in the NVIDIA paper, it is not possible to distinguish as to which parts of the network perform feature extraction and which parts of the network perform the steering control.

I have not used any dropout layers in the architecture below as we are strictly following the NVIDIA architecture . Also in my tests, I saw a degradation in performance when I added dropout layers in between any of the layers shown below. We combat overfitting with data augmentation as described below.

I am using the adam optimizer since it given good results in my previous experiences and we do not need to adjust the learning rates manually.

The complete architecture of the model is summarized in diagram as below.

alt tag

##Training process

Training data

The raw data that was to be used for training this model was the Udacity data. It consists of 8036 samples. We are using a generator to generate additional data, augmenting it with additional information and using the augmented data for training the model. The augmentation process is described as below

####Image augmentation and preprocessing

I have trained a new network from scratch vs using transfer learning and using qualities of a pre-trained network. Hence we would need a large amount of data for training the network from scratch.The data in the Udacity training is less as compared to the data required for training a completely new model.

Hence we will need additional data from the existing training data in order to train the network and making a reliable model.

I have followed techniques as described in the NVIDIA paper for data augmentation and training.

Following are the sample of images and their outputs after different stages of preprocessing.

alt_tag
Center Raw Image before resizing or preprocessing

alt_tag
Left Raw Image before resizing or preprocessing

alt_tag
Right Raw Image before resizing or preprocessing

alt_tag
Center Image after resizing

alt_tag
Left Image after resizing

alt_tag
Right Image after resizing

alt_tag
Center Image after resizing and adding random shadow

alt_tag
Left Image after resizing and adding random shadow

alt_tag
Right Image after resizing and adding random shadow

alt_tag
Center Image after flipping the image

alt_tag
Left Image after flipping the image

alt_tag
Right Image after flipping the image

###Validation data

In addition to the generator for training data, I have also created another generator for validation data. This generator purely uses the data from the Udacity data and simply performs the resizing and the normalization of only the CENTER images to generate the validation data. This simulates the data that the model will be using on the track. Since we are not augmenting the data in any format other than the resizing and normalization, the probablity of the validation data being the same as the training data reduces significantly and hence the chances of overfitting as well reduce significantly.

##Modifications to drive.py :

###Results and observations

##Future work and scope for improvement

#References

[1]https://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf [2]https://chatbotslife.com/using-augmentation-to-mimic-human-driving-496b569760a9#.uxbr7bd26 [3]https://chatbotslife.com/learning-human-driving-behavior-using-nvidias-neural-network-model-and-image-augmentation-80399360efee#.ehk89vxh4