In this post we will try to develop a practical intuition about convolutions and visualize different steps used in convolutional neural network architectures. The code used for this tutorial can be found here.
This tutorial does not cover back propagation, sparse connectivity, shared weights and other theoritical aspects that are already covered in other courses and tutorials. Instead it focuses on giving a practical intuition on how to use tensorflow to build a convolutional model.
- So what are convoltions?
Convolution is a mathematical operation between two functions producing a third convoluted function that is a modefied version of the first function. In the case of image processing, it’s the process of multiplying each element of matrix with its local neighbors, weighted by the kernel (or filter). For example, given a maxtrix and kernel as follow:
and
The discrete convolution operation is defined as
To visualize how convolution slides to calculate the output matrix, it’s good to look at a vizualization:
Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution In convolutional architectures it’s also common to use pooling layer after each convolution, these pooling layers generally simplify the information of the convolution layer before, by choosing the most prominent value (max pooling) or averaging the values calculated in by the convolution (average pooling).
Source: http://deeplearning.stanford.edu/wiki/index.php/File:Pooling_schematic.gif
- Computing convolutions and pooling:
First of all let’s prepare the environment:
1 |
|
We need also a picture to play with
1 |
|
And in order to visualize the result of each operation, we need to write some utils functions
1 |
|
Now that we have a way to visualize every step, let’s create the tensorflow operations
1 |
|
The convolve function that we just built, will allow us to try any filter from this list of filters in the gimp’s documentation. We will try some of them here, but you can modify the kernels or other kernels from the list to see how image changes.
1 |
|
This gives the following results
1 |
|
1 |
|
This gives the following results
1 |
|
After trying the filters in the example to see how they change the original image, we not only started to develop an intuition about how these operations work, but also we prepared the practical tools to build a convolutional neural network.
CNNs are a family of neural network architecture built essentially based on multiple layers of convolutions with nonlinear activation functions, e.g sigmoid, relu or tanh applied to the results, followed with either other convolutions layers or pooling layers and finally fully connected layers. In this section we will be using the high-level machine learning API tf.contrib.learn
, tf.contrib.layers
and tf.contrib.layers
to create, train and configure our models.
LeNet model contains the essence of CNNs that are still used in larger and newer models. LeNet consists of 2 convolutional layers followed by a dense layer.
A convolution layer in LeNet model consists of a convolution operation followed by a max pooling operation:
1 |
|
We need also to define our fully connected layer which could have some dropout:
1 |
|
Using the layers defined earlier we can easily define LeNet model:
1 |
|
Finally after running our model, we can look at the loss and the model graph
Here’s also AlexNet’s model graph
The full code used in this tutorial can be found in this github repo