Playing with convolutions in TensorFlow

In this post we will try to develop a practical intuition about convolutions and visualize different steps used in convolutional neural network architectures. The code used for this tutorial can be found here.

This tutorial does not cover back propagation, sparse connectivity, shared weights and other theoritical aspects that are already covered in other courses and tutorials. Instead it focuses on giving a practical intuition on how to use tensorflow to build a convolutional model.

  • So what are convoltions?

Convolution is a mathematical operation between two functions producing a third convoluted function that is a modefied version of the first function. In the case of image processing, it’s the process of multiplying each element of matrix with its local neighbors, weighted by the kernel (or filter). For example, given a maxtrix and kernel as follow:

and

The discrete convolution operation is defined as

To visualize how convolution slides to calculate the output matrix, it’s good to look at a vizualization:

Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution In convolutional architectures it’s also common to use pooling layer after each convolution, these pooling layers generally simplify the information of the convolution layer before, by choosing the most prominent value (max pooling) or averaging the values calculated in by the convolution (average pooling).

Source: http://deeplearning.stanford.edu/wiki/index.php/File:Pooling_schematic.gif

  • Computing convolutions and pooling:

First of all let’s prepare the environment:

1
2
3
4
5
6
1 import numpy as np
2 import matplotlib.pyplot as plt
3 import matplotlib.gridspec as gridspec
4 import tensorflow as tf
5 from PIL import Image
6 import numpy

We need also a picture to play with

1
2
1 img = Image.open('gray_kitten.jpg')
2 plt.imshow(img)

And in order to visualize the result of each operation, we need to write some utils functions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 1 def show_image_ops_gray(img, conv_op, sigmoid_op, avg_pool_op, max_pool_op):
 2 gs1 = gridspec.GridSpec(1, 5)
 3 plt.subplot(gs1[0, 0]); plt.axis('off'); plt.imshow(img[:, :], cmap=plt.get_cmap('gray'))
 4 plt.subplot(gs1[0, 1]); plt.axis('off'); plt.imshow(conv_op[0, :, :, 0], cmap=plt.get_cmap('gray'))
 5 plt.subplot(gs1[0, 2]); plt.axis('off'); plt.imshow(sigmoid_op[0, :, :, 0], cmap=plt.get_cmap('gray'))
 6 plt.subplot(gs1[0, 3]); plt.axis('off'); plt.imshow(avg_pool_op[0, :, :, 0], cmap=plt.get_cmap('gray'))
 7 plt.subplot(gs1[0, 4]); plt.axis('off'); plt.imshow(max_pool_op[0, :, :, 0], cmap=plt.get_cmap('gray'))
 8 plt.show()
 9 
10 
11 def show_image_ops_rgb(img, conv_op, sigmoid_op, avg_pool_op, max_pool_op):
12 gs1 = gridspec.GridSpec(1, 5)
13 plt.subplot(gs1[0, 0]); plt.axis('off'); plt.imshow(img[:, :, :])
14 plt.subplot(gs1[0, 1]); plt.axis('off'); plt.imshow(conv_op[0, :, :, :])
15 plt.subplot(gs1[0, 2]); plt.axis('off'); plt.imshow(sigmoid_op[0, :, :, :])
16 plt.subplot(gs1[0, 3]); plt.axis('off'); plt.imshow(avg_pool_op[0, :, :, :])
17 plt.subplot(gs1[0, 4]); plt.axis('off'); plt.imshow(max_pool_op[0, :, :, :])
18 plt.show()
19 
20 def show_shapes(img, conv_op, sigmoid_op, avg_pool_op, max_pool_op):
21 print("""
22 image filters (shape {})
23 conv_op filters (shape {})
24 sigmoid_op filters (shape {})
25 avg_pool_op filters (shape {})
26 max_pool_op filters (shape {})
27 """.format(
28 img.shape, conv_op.shape, sigmoid_op.shape, avg_pool_op.shape, max_pool_op.shape))

Now that we have a way to visualize every step, let’s create the tensorflow operations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
 1 def convolve(img, kernel, strides=[1, 3, 3, 1], pooling=[1, 3, 3, 1], padding='SAME', rgb=True):
 2 with tf.Graph().as_default():
 3 num_maps = 3
 4 if not rgb:
 5 num_maps = 1 # set number of maps to 1
 6 img = img.convert('L', (0.2989, 0.5870, 0.1140, 0)) # convert to gray scale
 7 
 8 
 9 # reshape image to have a leading 1 dimension
10 img = numpy.asarray(img, dtype='float32') / 256.
11 img_shape = img.shape
12 img_reshaped = img.reshape(1, img_shape[0], img_shape[1], num_maps)
13 
14 x = tf.placeholder('float32', [1, None, None, num_maps])
15 w = tf.get_variable('w', initializer=tf.to_float(kernel))
16 
17 # operations
18 conv = tf.nn.conv2d(x, w, strides=strides, padding=padding)
19 sig = tf.sigmoid(conv)
20 max_pool = tf.nn.max_pool(sig, ksize=[1, 3, 3, 1], strides=[1, 3, 3, 1], padding=padding)
21 avg_pool = tf.nn.avg_pool(sig, ksize=[1, 3, 3, 1], strides=[1, 3, 3, 1], padding=padding)
22 
23 init = tf.initialize_all_variables()
24 with tf.Session() as session:
25 session.run(init)
26 conv_op, sigmoid_op, avg_pool_op, max_pool_op = session.run([conv, sig, avg_pool, max_pool],
27 feed_dict={x: img_reshaped})
28 
29 show_shapes(img, conv_op, sigmoid_op, avg_pool_op, max_pool_op)
30 if rgb:
31 show_image_ops_rgb(img, conv_op, sigmoid_op, avg_pool_op, max_pool_op)
32 else:
33 show_image_ops_gray(img, conv_op, sigmoid_op, avg_pool_op, max_pool_op)

The convolve function that we just built, will allow us to try any filter from this list of filters in the gimp’s documentation. We will try some of them here, but you can modify the kernels or other kernels from the list to see how image changes.

1
2
3
4
5
6
7
1 a = np.zeros([3, 3, 1, 1])
2 a[1, 1, :, :] = 5
3 a[0, 1, :, :] = -1
4 a[1, 0, :, :] = -1
5 a[2, 1, :, :] = -1
6 a[1, 2, :, :] = -1
7 convolve(img, a, rgb=False)

This gives the following results

1
2
3
4
5
1
2
3
4
5
image filters (shape (134, 240))
conv_op filters (shape (1, 45, 80, 1))
sigmoid_op filters (shape (1, 45, 80, 1))
avg_pool_op filters (shape (1, 15, 27, 1))
max_pool_op filters (shape (1, 15, 27, 1))

1
2
3
4
5
6
7
8
9
10
11
 1 a = np.zeros([3, 3, 1, 1])
 2 a[1, 1, :, :] = 0.25
 3 a[0, 1, :, :] = 0.125
 4 a[1, 0, :, :] = 0.125
 5 a[2, 1, :, :] = 0.125
 6 a[1, 2, :, :] = 0.125
 7 a[0, 0, :, :] = 0.0625
 8 a[0, 2, :, :] = 0.0625
 9 a[2, 0, :, :] = 0.0625
10 a[2, 2, :, :] = 0.0625
11 convolve(img, a, rgb=False)

This gives the following results

1
2
3
4
5
1
2
3
4
5
image filters (shape (134, 240))
conv_op filters (shape (1, 45, 80, 1))
sigmoid_op filters (shape (1, 45, 80, 1))
avg_pool_op filters (shape (1, 15, 27, 1))
max_pool_op filters (shape (1, 15, 27, 1))

After trying the filters in the example to see how they change the original image, we not only started to develop an intuition about how these operations work, but also we prepared the practical tools to build a convolutional neural network.

CNNs are a family of neural network architecture built essentially based on multiple layers of convolutions with nonlinear activation functions, e.g sigmoid, relu or tanh applied to the results, followed with either other convolutions layers or pooling layers and finally fully connected layers. In this section we will be using the high-level machine learning API tf.contrib.learn, tf.contrib.layers and tf.contrib.layers to create, train and configure our models.

LeNet model contains the essence of CNNs that are still used in larger and newer models. LeNet consists of 2 convolutional layers followed by a dense layer.

A convolution layer in LeNet model consists of a convolution operation followed by a max pooling operation:

1
2
3
4
5
6
7
8
9
1 def lenet_layer(tensor_in, n_filters, kernel_size, pool_size, activation_fn=tf.nn.tanh,
2 padding='SAME'):
3 conv = tf.contrib.layers.convolution2d(tensor_in,
4 num_outputs=n_filters,
5 kernel_size=kernel_size,
6 activation_fn=activation_fn,
7 padding=padding)
8 pool = tf.nn.max_pool(conv, ksize=pool_size, strides=pool_size, padding=padding)
9 return pool

We need also to define our fully connected layer which could have some dropout:

1
2
3
4
5
6
7
8
9
10
11
12
 1 def dense_layer(tensor_in, layers, activation_fn=tf.nn.tanh, keep_prob=None):
 2 if not keep_prob:
 3 return tf.contrib.layers.stack(
 4 tensor_in, tf.contrib.layers.fully_connected, layers, activation_fn=activation_fn)
 5 
 6 tensor_out = tensor_in
 7 for layer in layers:
 8 tensor_out = tf.contrib.layers.fully_connected(tensor_out, layer,
 9 activation_fn=activation_fn)
10 tensor_out = tf.contrib.layers.dropout(tensor_out, keep_prob=keep_prob)
11 
12 return tensor_out

Using the layers defined earlier we can easily define LeNet model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
 1 def flatten_convolution(tensor_in):
 2 tendor_in_shape = tensor_in.get_shape()
 3 tensor_in_flat = tf.reshape(tensor_in, [tendor_in_shape[0].value or -1, np.prod(tendor_in_shape[1:]).value])
 4 return tensor_in_flat
 5 
 6 def lenet_model(X, y, image_size=(-1, IMAGE_SIZE, IMAGE_SIZE, 1), pool_size=(1, 2, 2, 1)):
 7 y = tf.one_hot(y, 10, 1, 0)
 8 X = tf.reshape(X, image_size)
 9 
10 with tf.variable_scope('layer1'):
11 """
12 Valid:
13 * input: (?, 28, 28, 1)
14 * filter: (5, 5, 1, 4)
15 * pool: (1, 2, 2, 1)
16 * output: (?, 12, 12, 4)
17 Same:
18 * input: (?, 28, 28, 1)
19 * filter: (5, 5, 1, 4)
20 * pool: (1, 2, 2, 1)
21 * output: (?, 14, 14, 4)
22 """
23 layer1 = lenet_layer(X, 4, [5, 5], pool_size)
24 
25 with tf.variable_scope('layer2'):
26 """
27 VALID:
28 * input: (?, 12, 12, 4)
29 * filter: (5, 5, 4, 6)
30 * pool: (1, 2, 2, 1)
31 * output: (?, 4, 4, 6)
32 * flat_output: (?, 4 * 4 * 6)
33 SAME:
34 * input: (?, 14, 14, 4)
35 * filter: (5, 5, 4, 6)
36 * pool: (1, 2, 2, 1)
37 * output: (?, 7, 7, 6)
38 * flat_output: (?, 7 * 7 * 6)
39 """
40 layer2 = lenet_layer(layer1, 6, [5, 5], pool_size)
41 layer2_flat = flatten_convolution(layer2)
42 
43 result = dense_layer(layer2_flat, [1024], activation_fn=tf.nn.tanh, keep_prob=0.5)
44 prediction, loss = tf.contrib.learn.models.logistic_regression_zero_init(result, y)
45 train_op = tf.contrib.layers.optimize_loss(
46 loss, tf.contrib.framework.get_global_step(), optimizer='Adagrad',
47 learning_rate=0.1)
48 return {'class': tf.argmax(prediction, 1), 'prob': prediction}, loss, train_op

Finally after running our model, we can look at the loss and the model graph

Here’s also AlexNet’s model graph

The full code used in this tutorial can be found in this github repo