Effective Image Recognition — From Hyperparameters to Convolutional Neural Networks
Machine Learning has been used widely across multiple fields in the recent years. The computational advances and availability of affordable GPU has enabled us to design complex “deep” neural networks which can have dozen(s) of hidden layers for solving complex problems. As we build sophisticated mathematical models using neural networks, we realize that there is room for fine tuning a given model before going “deeper” and making it computationally intensive. In this article, we will look at the problem of Image Recognition which is a “Classification” task in Machine Learning literature and demonstrate how we can effectively perform this task.
For this example, we will use the MNIST digits sample set. This dataset has 60,000 image samples as a training set and 10,000 image samples as a test set of handwritten digits from 0 to 9. These images are 28x28 pixels wide in black and white. Our goal is to design a Neural Network which can classify these images from 0–9. We will go through the exercise in three iterations and evaluate the accuracy of the network at each step.
· Building a Deep Neural Network
· Fine Tuning the network
· Convolutional Neural Networks (CNN)
We will be using Keras for this example. Keras is high level API layer written in Python which uses the popular frameworks like Tensorflow CNTK, or Theano as the underlying backbone. We will be using Keras with Tensorflow in this case.
Building a Deep Neural Network
For the first iteration, we build a three-layer deep neural network with the second one as a hidden layer. Let us review the code to understand this better.
Here are the key take points to note about the code.
· We get the MNIST data using the inbuilt keras.datasets API call.
· The image data is two dimensional but is flattened such that training and test data sets are reshaped to be of a single dimensional vector. Furthermore, the data is normalized as well to avoid any over fitting issues
· This is a multiclassification problem where the outputs can be from 0–9. In such cases, it is customary to use One Hot Encoding such that output is still considered binary. For each of the inputs, the below table shows the corresponding output.
· We use Stochastic Gradient Descent for our Optimizer.
· We use Sigmoid for our activation function as it was the standard option until recently. The choice of activation function is important because this entity introduces any non-linearity in our mathematical model, as complex models are rarely linear.
· In this iteration we use a hidden layer and use SoftMax function for the output layer. In our case of a multiclass classification, the outputs are mutually exclusive. A SoftMax layer will output a probability for each of the ten digits such that all of these will add up to 1. The number being recognized will have the highest probability in that collection.
This simple set up gives us an accuracy score of 89%. This is a good start. We will look at some steps to make this number progressively better.
Fine Tuning the network
In the next step, let’s make a few modifications to our simple deep neural network. We will revise the hyperparameters of the network to get a better accuracy for our prediction. Hyperparameters are external variables that are set for the neural network that cannot be estimated from training data. These can be tuned to improve the accuracy for the neural network. Some examples including choice of error function, learning rate, activation function, etc. Here are the changes we made in the hyperparameters:
· Used Relu as the activation function instead of Sigmoid — Relu has been the recommended function lately compared to tanh or Sigmoid. This article provides a little more detail on the options for these activation functions — https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6.
· Used Adam as the optimizer instead of Stochastic Gradient Descent (SGD). Adam has a lower training cost compared to SGD. The detailed analysis and the mathematical foundation behind Adam can be seen in the paper at https://arxiv.org/abs/1412.6980
The other change is in the architecture of our network itself. We include a DROPOUT layer and a Hidden layer. Dropout is a mechanism wherein a few neurons are dropped from the model to help avoid overfitting. Below is the code snippet after making these changes.
With these simple changes, we see a marked improvement in the performance of our network
The test accuracy has jumped up to 97.6%. This is a significant improvement, but we can still improve the performance on this dataset.
Convolutional Neural Networks (CNN)
Convolutional Neural Networks have been popular in the Image Processing domain because this approach is based on a fundamental idea that for images, there is a high spatial correlation between neighboring pixels. The whole image is split in multiple convolution layers of fixed length x breadth. In each of these layers, the key features are exacted using a process known as pooling. This exercise is repeated till we extract all the features of the image, and the final layer is flattened using SoftMax function. The high-level architecture is shown here.
Let us review the code for this step.
The accuracy for this approach is significantly better at 99.26%. This illustrates the reason as to why CNNs are so popular for Image Recognition.
We noticed that simple changes in Hyperparameters and the neural network architecture can result in improvement in model accuracy. There are certain problems which have an inherent tendency to have better results with different types of neural networks such as CNN, RNN, LSTM, GANs etc. In this post we showed how CNNs are the best fit for solving problems needing image classification.