In a previous project I experimented with Recurrent Neural Nets (RNNs), their implementation, and application to character-level language modelling. In that instance I used a simple RNN architecture, more or less a standard feed-forward network with an optional recurrent connection between layers. This type of network can learn short term correlations in the input stream, but struggles to learn long-term correlations. To address this problem, a special type of RNN was proposed in 1997, called Long Short-Term Memory (LSTM). This network architecture has more in-built structure, compared to a series of fully-connected layers. I view LSTMs as modelling a “memory latch”. An LSTM cell has one data input, one output, and 3 “control” inputs. The controls can be labelled as input, forget, and read. These affect how the LSTM cell treats the incoming data, the current internal state, and the output of the cell. There are several variations of the architecture of an LSTM cell, a common one is shown below: