Recurrent Neural Networks for Beginners

What are Recurrent Neural Networks and how can you use them?

In this post I discuss the basics of Recurrent Neural Networks (RNNs) which are deep learning models that are becoming increasingly popular. I don’t intend to get too heavily into the math and proofs behind why these work and am aiming for a more abstract understanding.

General Recurrent Neural Network information

Recurrent Neural Networks were created in the 1980’s but have just been recently gaining popularity from advances to the networks designs and increased computational power from graphic processing units. They’re especially useful with sequential data because each neuron or unit can use its internal memory to maintain information about the previous input. This is great because in cases of language, “I had washed my house” is much more different than “I had my house washed”. This allows the network to gain a deeper understanding of the statement.

This is important to note because reading through a sentence even as a human, you’re picking up the context of each word from the words before it.

 A RNN has loops in them that allow infromation to be carried across neurons while reading in input.

In these diagrams x_t is some input, A is a part of the RNN and h_t is the output. Essentially you can feed in words from the sentence or even characters from a string as x_t and through the RNN it will come up with a h_t.

The goal is to use h_t as output and compare it to your test data (which is usually a small subset of the original data). You will then get your error rate. After comparing your output to your test data, with error rate in hand, you can use a technique called Back Propagation Through Time (BPTT). BPTT back checks through the network and adjusts the weights based on your error rate. This adjusts the network and makes it learn to do better.

Theoretically RNNs can handle context from the begging of the sentence which will allow more accurate predictions of a word at the end of a sentence. In practice this isn’t necessarily true for vanilla RNNs. This is a major reason why RNNs faded out from practice for a while until some great results were achieved with using a Long Short Term Memory(LSTM) unit inside the Neural Network. Adding the LSTM to the network is like adding a memory unit that can remember context from the very beggining of the input.

These little memory units allow for RNNs to be much more accurate, and have been the recent cause of the popularity around this model. These memory units allow for the ability across inputs for context to be remembered. Two of these units are widely used today LSTMs and Gated Recurrent Units(GRU), the latter of the two are more efficient computationally because they take up less computer memory.

Applications of Recurrent Neural Networks

There are many different applications of RNNs. A great application is in collaboration with Natural Language Processing (NLP). RNNs have been demonstrated by many people on the internet who created amazing models that can represent a language model. These language models can take input such as a large set of shakespeares poems, and after training these models they can generate their own Shakespearean poems that are very hard to differentiate from originals!

Below is some Shakespeare

This poem was actually written by an RNN. This was from an awesome article here http://karpathy.github.io/2015/05/21/rnn-effectiveness/ that goes more indepth on Char RNNs.

This particular type of RNNs is fed in a dataset of text and reads the input in character by character. The amazing thing about these networks in comparison to feeding in a word at a time is that the network can create it’s own unique words that were not in the vocabulary you trained it on.

This diagram taken from the article referenced above shows how the model would predict “hello”. This gives a good visualization of how these networks take in a word character by character and predict the likely hood of the next probable character.

Another amazing application of RNNs is machine translation. This method is interesting because it involves training two RNNs simultaneously. In these networks the inputs are pairs of sentences in different languages. For example you can feed the network an English sentence paired with its French translation. With enough training you can give the network an english sentence and it will translate it to french! This model is called a Sequence 2 Sequences model or Encoder Decoder model.

This diagram shows how information flows through Encoders Decoder model. This diagram is using a word embedding layer to get better word representation. A word embedding layer is usally GloVe or Word2Vec algorithm that just takes a bunch of words and creates a weighted matrix that allows similar words to be correlated with each other. Using an embedding layer genererally makes your RNN more accurate because it is a better representation of how similar words are so the net has less to infer.

Conclusion

Recurrent Neural Networks have been becoming very popular as of recently and for a very good reason. They’re one of the most effective models out for natural language processing. New applications of these models are coming out all the time and its exciting to see what researchers come up with.

To play around with some RNN check out these awesome libraries

Tensorflow — Googles Machine Learning frameworks RNN example: https://www.tensorflow.org/versions/r0.10/tutorials/recurrent/index.html

Keras — a high level machine learning package that runs on top of Tensorflow or Theano: https://keras.io

Torch — Facebook machine learning framework in LUA: http://torch.ch


Also published on Medium.