Recurrent layers

The main idea behind a recurrent neural network (RNN) is to grab the relations in a process where the input is a time-sequence with consequtive data pieces. Therefore the order of the pieces is strict. The relation connects to the time. An examplary problem can be when a software tries to predict the next data piece by knowing the previous 5 pieces.

Generally the input of an rnn is a set of vectors (e.g.: a vector for each time point) and the output is also a set of vector but may be it has different size then the input. Each vector is processed by a cell (only one type of cell is applied) but in a strict order and with some delay. Therefore as the computation goes each cell processes a vector, gives (or not) an output and calculates some further state values which will be fed into the next cell. The next cell receives the state values and the next input vector from the sequence, executes the calculations and so on.

Shortly, an RNN is a finite long sequence of cells. Each cell has the same structure. The input vectors processed consequtively and some hidden variables are computed and forwarded as well. Some references for further insight:

The Unreasonable Effectiveness of Recurrent Neural Networks
A Beginner’s Guide to Recurrent Networks and LSTMs
Recurrent Neural Networks Tutorial on WILDML.

[source]

SimpleRNN

The structure of the so called simple RNN layer in Keras is the following:

simplernn

The h is the hidden variable, R is the recurrent kernel, W is the kernel, b is the bias and f is the activation function.

Input:

A Data2D type with the shape: (1, timesteps, input dimension, batches).

Output:

A Data2D type with shape: (1, 1, units, batches).

Methods:

The SimpleRNN cell implements the operations shown in the picture and feeds the output to the next cell. The unrolled cells are implemented by a for cycle. It can be useful to inspect the Keras implementation as well.

[source]

LSTM

The structure of the LSTM cell is the following:

lstm

The g is the recurrent activation, p is the activation, Ws are the kernels, Us are the recurrent kernels, h is the hidden variable which is the output too and the notation * is an element-wise multiplication.

Input:

A Data2D type with the shape: (1, timesteps, input dimension, batches).

Output:

A Data2D type with shape: (1, 1, units, batches).

Methods:

The Keras implementation can help as well: see the step function in the LSTM implementation. Basically the products are dot products between matricies.

[source]

GRU

The structure of the GRU cell is the following:

gru

The meaning of the notations are the same as in case of LSTM. 1-z means an element-wise subtraction.

Input:

A Data2D type with the shape: (1, timesteps, input dimension, batches).

Output:

A Data2D type with shape: (1, 1, units, batches).

Methods:

The Keras implementation can help as well: see the step function in the LSTM implementation. This blog article can be useful as well.