What are some useful tips for choosing and tweaking a convolutional neural network architecture and hyperparameters?
The Deep Learning Book has a separate chapter for this. I’ll recommend you to go through the same as it mentions a lot of things that I learned the hard way, i.e. by actually running multiple experiments.
These are a few things that I try to follow:
- Almost certainly choose a pre-trained model (generally AlexNet or VGGNet) to run the first experiment. Why? This is because these models have been trained on a really large dataset and the initial layers learn features that are generic to any image. If the images are those of natural objects, you are better off freezing the convolutional layers and simply training the fully connected layers.
- Initially add only a single fully connected layer, and try adding more layers only if the accuracy doesn’t turn out to be reasonable.
- Always use ReLU as the activation function in the convolutional layers. You can read on the benefits of using ReLU here.
- Use Dropout (=0.5 to begin). Read more on dropout here.
- Use Xavier initialization for the weights of the fully connected layers. Keras implementation here.
- Play around with the learning rate. It has happened many a times that a model whose accuracy doesn’t seem to change at learning rate = 0.01, reaches 90% accuracy on changing the factor to 1e-4 or 1e-5. This is the most important parameter to tweak during the training process.
These last two points are not related to changes in the architecture but are quite important from an experimentation point of view.
- Data augmentation — This is a very strong regularization technique where you artificially increase the dataset size by adding transformations to the original dataset. Common transformations are translation, rotation, adding shear, horizontal flips, etc. Keras has a very useful ImageDataGenerator that does all the work for you. You can also go through the chapter on Regularization for Deep learning in the book linked above. It mentions all the various techniques used and also the math behind it.
- Finally, USE Keras. And by USE, I mean really leverage the power Keras provides while experimentation.
Hope this helped :)
Originally published at www.quora.com.