Python Deep Learning tutorial: Elman RNN implementation in Tensorflow

In this Python Deep Learning tutorial, an implementation and explanation is given for an Elman RNN. The implementation is done in Tensorflow, which is one of the many Python Deep Learning libraries.

A more modern RNN is the GRU. A GRU has less parameters to train and is therefore quite fast. An implementation in Tensorflow of the GRU can be found here.

So… What is an Elman RNN?

Suppose you have many documents and you would like to find all names in the documents. Then one option is to use an Elman RNN! This RNN is originally invented by Jeffrey Elman [1]. The Elman RNN reads word (and context) by word (and context) and tries to predict the label (either O if the word is not a name, B-NAME if the word is the beginning of a name and I-NAME if the word is the continuation of a name). One example of such a labeling is the following:

Hi, I am Data Blogger. How are you?

The labelling (or annotation) would then be as follows:

Hi [O], I [O] am [O] Data [B-NAME] Blogger [I-NAME]. How [O] are[O] you [O]?

In fact, the punctuation marks (.,!?) could also be labelled, but tokenizing words is not the topic of this article, so let’s focus on the network architecture.

The Architecture

The Elman Recurrent Neural Network is a neural network with a variable number of recursions. In this article, we will implement one including a Word Embedding layer which converts words to a semantic representation. Then, these embeddings are fed to the recursion layer which takes the previous hidden layer ($latex h_{t-1}$) and current token ($latex x_t$) as input and produces probabilities for each of the classes ($latex p(O)$, $latex p(B-NAME)$, $latex p(I-NAME)$) as output. The internal hidden layer and the next token are then used as input for the recursion.

A few words on the implementation

Tensorboard

I found it extremely useful to use Tensorboard! With Tensorboard, you can visualize the variables and by that I discovered that I obtained NaN values fairly quickly. This was due to an exploding gradient. I limited the learning rate and things went well. The loss decreased and in the end I obtained satisfying results.

Data loaders

I always create classes that load the data. In that way, I have a data abstraction and that is useful because almost every project has different data. I can then reuse my models and just plugin another data loader for different projects. Furthermore, make sure that your data formats are fast to read! I now use cPickle for storing and loading my data and this saved me a lot of time since for every run I need to load many datafiles. If you have a different approach, please let me know in the comment section below!

The implementation

So, now I told you about the model, I will give you the implementation in Python and using Tensorflow. If you have any questions, please ask me and send a comment in the comment section below! I will first show you the graph obtained from the Tensorboard:

I grouped relevant layers into scopes. What you can clearly see is that the input layer sends its data to the recurrence layer. This is correct, since the inputs are converted into embeddings and the embeddings are used for the recurrence layer. Then, the outputs of the recurrence layers are used to calculate the output. For backpropagation, the gradients are computed and are fed back into the network to update the weights and biases.

This is the resulting code for my Elman RNN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import tensorflow as tf
import numpy as np


class ElmanRNN:
 def __init__(self, num_vocab, num_embedding, num_context, num_hidden, num_classes):
 self.num_vocab = num_vocab
 self.num_embedding = num_embedding
 self.num_context = num_context
 self.num_hidden = num_hidden
 self.num_classes = num_classes

 self.params = {}

 with tf.name_scope('input'):
 with tf.name_scope('input_data'):
 self.data = tf.placeholder(tf.int32, name='data')

 with tf.name_scope('embedding'):
 self.idxs = tf.reshape(self.data, (-1,)) + 1
 self.emb = tf.Variable(name='emb', initial_value=ElmanRNN.initialize(num_vocab + 1, num_embedding))
 self.params['emb'] = self.emb
 self.input_emb = tf.gather(self.emb, self.idxs)
 self.input_emb_fix = tf.where(tf.is_nan(self.input_emb), tf.zeros_like(self.input_emb), self.input_emb)


 with tf.name_scope('x'):
 num_inputs = tf.shape(self.data)[0]
 self.wx = tf.Variable(name='wx', initial_value=ElmanRNN.initialize(num_embedding * num_context, num_hidden))
 self.params['wx'] = self.wx
 self.x = tf.reshape(self.input_emb_fix, (num_inputs, num_embedding * num_context), name='x')

 with tf.name_scope('recurrence'):
 with tf.name_scope('recurrence_init'):
 self.h = tf.Variable(name='h', initial_value=np.zeros((1, num_hidden)))
 h_0 = tf.reshape(self.h, (1, num_hidden))
 self.params['h'] = self.h
 s_0 = tf.constant(np.matrix([0.] * num_classes))
 y_0 = tf.constant(0, 'int64')
 with tf.name_scope('hidden'):
 self.wh = tf.Variable(name='wh', initial_value=ElmanRNN.initialize(num_hidden, num_hidden))
 self.bh = tf.Variable(name='bh', initial_value=np.zeros((1, num_hidden)))
 self.params['wh'] = self.wh
 self.params['bh'] = self.bh
 with tf.name_scope('classes'):
 self.w = tf.Variable(name='w', initial_value=ElmanRNN.initialize(num_hidden, num_classes))
 self.b = tf.Variable(name='b', initial_value=np.zeros((1, num_classes)))
 self.params['w'] = self.w
 self.params['b'] = self.b
 self.h, self.s, self.y = tf.scan(self.recurrence, self.x, initializer=(h_0, s_0, y_0))

 with tf.name_scope('output'):
 with tf.name_scope('target_data'):
 self.target = tf.placeholder(tf.float64, name='target')
 with tf.name_scope('probabilities'):
 self.s = tf.squeeze(self.s)
 with tf.name_scope('outcomes'):
 self.y = tf.squeeze(self.y)
 with tf.name_scope('loss'):
 self.loss = -tf.reduce_sum(self.target * tf.log(tf.clip_by_value(self.s, 1e-20, 1.0)))

 @staticmethod
 def initialize(*shape):
 return 0.001 * np.random.uniform(-1., 1., shape)

 def recurrence(self, old_state, x_t):
 h_t, s_t, y_t = old_state
 x = tf.reshape(x_t, (1, self.num_embedding * self.num_context))
 input_layer = tf.matmul(x, self.wx)
 hidden_layer = tf.matmul(h_t, self.wh) + self.bh
 h_t_next = input_layer + hidden_layer
 s_t_next = tf.nn.softmax(tf.matmul(h_t_next, self.w) + self.b)
 y_t_next = tf.squeeze(tf.argmax(s_t_next, 1))
 return h_t_next, s_t_next, y_t_next

Conclusion (TL;DR)

This Python deep learning tutorial showed you how to implement an Elman RNN in Tensorflow. If you have any unanswered questions, feel free to ask them in the comments!

References

[1] Elman, Jeffrey L. “Distributed representations, simple recurrent networks, and grammatical structure.” Machine learning 7.2-3 (1991): 195-225.