Sequence prediction using recurrent neural networks(LSTM) with TensorFlow

This post tries to demonstrates how to approximate a sequence of vectors using a recurrent neural networks, in particular I will be using the LSTM architecture, The complete code used for this post could be found here. Most of the examples I found in the internet apply the LSTM architecture to natural language processing problems, and I couldn’t find an example where this architecture could be used to predict continuous values.


Update: code compatible with tensorflow 1.1.0

The same model can be achieved by using the LSTM layer from polyaxon, here’s a an experiment configuration to achieve the same results from this post:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
 1 import polyaxon as plx
 2 
 3 
 4 def create_experiment(output_dir, X, y, train_steps=1000, num_units=7, output_units=1, num_layers=1):
 5 """Creates an experiment using LSTM architecture for timeseries regression problem."""
 6 
 7 config = {
 8 'name': 'time_series',
 9 'output_dir': output_dir,
10 'eval_every_n_steps': 100,
11 'train_steps_per_iteration': 100,
12 'train_steps': train_steps,
13 'run_config': {'save_checkpoints_steps': 100},
14 'train_input_data_config': {
15 'input_type': plx.configs.InputDataConfig.NUMPY,
16 'pipeline_config': {'name': 'train', 'batch_size': 64, 'num_epochs': None,
17 'shuffle': False},
18 'x': {'x': X['train']},
19 'y': y['train']
20 },
21 'eval_input_data_config': {
22 'input_type': plx.configs.InputDataConfig.NUMPY,
23 'pipeline_config': {'name': 'eval', 'batch_size': 32, 'num_epochs': None,
24 'shuffle': False},
25 'x': {'x': X['val']},
26 'y': y['val']
27 },
28 'estimator_config': {'output_dir': output_dir},
29 'model_config': {
30 'module': 'Regressor',
31 'loss_config': {'module': 'mean_squared_error'},
32 'eval_metrics_config': [{'module': 'streaming_root_mean_squared_error'},
33 {'module': 'streaming_mean_absolute_error'}],
34 'optimizer_config': {'module': 'adagrad', 'learning_rate': 0.1},
35 'graph_config': {
36 'name': 'regressor',
37 'features': ['x'],
38 'definition': [
39 (plx.layers.LSTM, {'num_units': num_units, 'num_layers': num_layers}),
40 (plx.layers.FullyConnected, {'num_units': output_units}),
41 ]
42 }
43 }
44 }
45 experiment_config = plx.configs.ExperimentConfig.read_configs(config)
46 return plx.experiments.create_experiment(experiment_config)

End of the update.


Original Post:

So the task here is to predict a sequence of real numbers based on previous observations. The traditional neural networks architectures can’t do this, this is why recurrent neural networks were made to address this issue, as they allow to store previous information to predict future event.

In this example we will try to predict a couple of functions:

  • sin and cos on the same time

First of all let’s build our model, lstm_model, the model is a list of stacked lstm cells of different time steps followed by a dense layers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
 1 def lstm_model(time_steps, rnn_layers, dense_layers=None):
 2 """
 3 Creates a deep model based on:
 4 * stacked lstm cells
 5 * an optional dense layers
 6 :param time_steps: the number of time steps the model will be looking at.
 7 :param rnn_layers: list of int or dict
 8 * list of int: the steps used to instantiate the `BasicLSTMCell` cell
 9 * list of dict: [{steps: int, keep_prob: int}, ...]
10 :param dense_layers: list of nodes for each layer
11 :return: the model definition
12 """
13 
14 def lstm_cells(layers):
15 if isinstance(layers[0], dict):
16 return [tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicLSTMCell(layer['steps'],
17 state_is_tuple=True),
18 layer['keep_prob'])
19 if layer.get('keep_prob') else tf.nn.rnn_cell.BasicLSTMCell(layer['steps'],
20 state_is_tuple=True)
21 for layer in layers]
22 return [tf.nn.rnn_cell.BasicLSTMCell(steps, state_is_tuple=True) for steps in layers]
23 
24 def dnn_layers(input_layers, layers):
25 if layers and isinstance(layers, dict):
26 return learn.ops.dnn(input_layers,
27 layers['layers'],
28 activation=layers.get('activation'),
29 dropout=layers.get('dropout'))
30 elif layers:
31 return learn.ops.dnn(input_layers, layers)
32 else:
33 return input_layers
34 
35 def _lstm_model(X, y):
36 stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(lstm_cells(rnn_layers), state_is_tuple=True)
37 x_ = learn.ops.split_squeeze(1, time_steps, X)
38 output, layers = tf.nn.rnn(stacked_lstm, x_, dtype=dtypes.float32)
39 output = dnn_layers(output[-1], dense_layers)
40 return learn.models.linear_regression(output, y)
41 
42 return _lstm_model

So our model expects a data with dimension corresponding to (batch size, time_steps of the first lstm cell, num_features in our data).

Next we need to prepare the data in a way that could be accepted by our model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
 1 def rnn_data(data, time_steps, labels=False):
 2 """
 3 creates new data frame based on previous observation
 4 * example:
 5 l = [1, 2, 3, 4, 5]
 6 time_steps = 2
 7 -> labels == False [[1, 2], [2, 3], [3, 4]]
 8 -> labels == True [2, 3, 4, 5]
 9 """
10 rnn_df = []
11 for i in range(len(data) - time_steps):
12 if labels:
13 try:
14 rnn_df.append(data.iloc[i + time_steps].as_matrix())
15 except AttributeError:
16 rnn_df.append(data.iloc[i + time_steps])
17 else:
18 data_ = data.iloc[i: i + time_steps].as_matrix()
19 rnn_df.append(data_ if len(data_.shape) > 1 else [[i] for i in data_])
20 return np.array(rnn_df)
21 
22 
23 def split_data(data, val_size=0.1, test_size=0.1):
24 """
25 splits data to training, validation and testing parts
26 """
27 ntest = int(round(len(data) * (1 - test_size)))
28 nval = int(round(len(data.iloc[:ntest]) * (1 - val_size)))
29 
30 df_train, df_val, df_test = data.iloc[:nval], data.iloc[nval:ntest], data.iloc[ntest:]
31 
32 return df_train, df_val, df_test
33 
34 
35 def prepare_data(data, time_steps, labels=False, val_size=0.1, test_size=0.1):
36 """
37 Given the number of `time_steps` and some data,
38 prepares training, validation and test data for an lstm cell.
39 """
40 df_train, df_val, df_test = split_data(data, val_size, test_size)
41 return (rnn_data(df_train, time_steps, labels=labels),
42 rnn_data(df_val, time_steps, labels=labels),
43 rnn_data(df_test, time_steps, labels=labels))
44 
45 
46 def generate_data(fct, x, time_steps, seperate=False):
47 """generates data with based on a function fct"""
48 data = fct(x)
49 if not isinstance(data, pd.DataFrame):
50 data = pd.DataFrame(data)
51 train_x, val_x, test_x = prepare_data(data['a'] if seperate else data, time_steps)
52 train_y, val_y, test_y = prepare_data(data['b'] if seperate else data, time_steps, labels=True)
53 return dict(train=train_x, val=val_x, test=test_x), dict(train=train_y, val=val_y, test=test_y)

this will create a data that will allow our model to look time_steps number of times back in the past in order to make a prediction. So if for example our first cell is a 10 time_steps cell, then for each prediction we want to make, we need to feed the cell 10 historical data points. The y values should correspond to the tenth value of the data we want to predict.

Let’s first define the hyperparameters

1
2
3
4
5
6
7
1 LOG_DIR = './ops_logs'
2 TIMESTEPS = 5
3 RNN_LAYERS = [{'steps': TIMESTEPS}, {'steps': TIMESTEPS, 'keep_prob': 0.5}]
4 DENSE_LAYERS = [2]
5 TRAINING_STEPS = 130000
6 BATCH_SIZE = 100
7 PRINT_STEPS = TRAINING_STEPS / 100

Now we can create a regressor based on our our model

1
2
3
4
5
6
7
1 regressor = learn.TensorFlowEstimator(model_fn=lstm_model(TIMESTEPS, RNN_LAYERS, DENSE_LAYERS),
2 n_classes=0,
3 verbose=1, 
4 steps=TRAINING_STEPS,
5 optimizer='Adagrad',
6 learning_rate=0.03,
7 batch_size=BATCH_SIZE)

Predicting the sin function

1
2
3
4
5
6
7
8
9
10
11
12
 1 X, y = generate_data(np.sin, np.linspace(0, 100, 10000), TIMESTEPS, seperate=False)
 2 # create a lstm instance and validation monitor
 3 validation_monitor = learn.monitors.ValidationMonitor(X['val'], y['val'],
 4 every_n_steps=PRINT_STEPS,
 5 early_stopping_rounds=1000)
 6 regressor.fit(X['train'], y['train'], validation_monitor, logdir=LOG_DIR)
 7 
 8 # > last training steps
 9 # Step #9700, epoch #119, avg. train loss: 0.00082, avg. val loss: 0.00084
10 # Step #9800, epoch #120, avg. train loss: 0.00083, avg. val loss: 0.00082
11 # Step #9900, epoch #122, avg. train loss: 0.00082, avg. val loss: 0.00082
12 # Step #10000, epoch #123, avg. train loss: 0.00081, avg. val loss: 0.00081

predicting the test data

1
2
3
1 mse = mean_squared_error(regressor.predict(X['test']), y['test'])
2 print ("Error: {}".format(mse))
3 # 0.000776

Predicting the sin and cos functions together

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
 1 def sin_cos(x):
 2 return pd.DataFrame(dict(a=np.sin(x), b=np.cos(x)), index=x)
 3 
 4 X, y = generate_data(sin_cos, np.linspace(0, 100, 10000), TIMESTEPS, seperate=False)
 5 # create a lstm instance and validation monitor
 6 validation_monitor = learn.monitors.ValidationMonitor(X['val'], y['val'],
 7 every_n_steps=PRINT_STEPS,
 8 early_stopping_rounds=1000)
 9 regressor.fit(X['train'], y['train'], validation_monitor, logdir=LOG_DIR)
10 
11 # > last training steps
12 # Step #9500, epoch #117, avg. train loss: 0.00120, avg. val loss: 0.00118
13 # Step #9600, epoch #118, avg. train loss: 0.00121, avg. val loss: 0.00118
14 # Step #9700, epoch #119, avg. train loss: 0.00118, avg. val loss: 0.00118
15 # Step #9800, epoch #120, avg. train loss: 0.00118, avg. val loss: 0.00116
16 # Step #9900, epoch #122, avg. train loss: 0.00118, avg. val loss: 0.00115
17 # Step #10000, epoch #123, avg. train loss: 0.00117, avg. val loss: 0.00115

predicting the test data

1
2
3
1 mse = mean_squared_error(regressor.predict(X['test']), y['test'])
2 print ("Error: {}".format(mse))
3 # 0.001144
  • predicted sin-cos function

Predicting the x*sin function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
 1 def x_sin(x):
 2 return x * np.sin(x)
 3 
 4 X, y = generate_data(x_sin, np.linspace(0, 100, 10000), TIMESTEPS, seperate=False)
 5 # create a lstm instance and validation monitor
 6 validation_monitor = learn.monitors.ValidationMonitor(X['val'], y['val'],
 7 every_n_steps=PRINT_STEPS,
 8 early_stopping_rounds=1000)
 9 regressor.fit(X['train'], y['train'], validation_monitor, logdir=LOG_DIR)
10 
11 # > last training steps
12 # Step #32500, epoch #401, avg. train loss: 0.48248, avg. val loss: 15.98678
13 # Step #33800, epoch #417, avg. train loss: 0.47391, avg. val loss: 15.92590
14 # Step #35100, epoch #433, avg. train loss: 0.45570, avg. val loss: 15.77346
15 # Step #36400, epoch #449, avg. train loss: 0.45853, avg. val loss: 15.61680
16 # Step #37700, epoch #465, avg. train loss: 0.44212, avg. val loss: 15.48604
17 # Step #39000, epoch #481, avg. train loss: 0.43224, avg. val loss: 15.43947

predicting the test data

1
2
3
1 mse = mean_squared_error(regressor.predict(X['test']), y['test'])
2 print ("Error: {}".format(mse))
3 # 61.024454351

  • predicted x*sin function

model loss

N.B I am not completely sure if this is the right way to train lstm on regression problems, I am still experimenting with the RNN sequence-to-sequence model, I will update this post or write a new one to use the sequence-to-sequence model.