chainer.functions.lstm

chainer.functions.lstm(c_prev, x)[source]

Long Short-Term Memory units as an activation function.

This function implements LSTM units with forget gates. Let the previous cell state c_prev and the input array x.

First, the input array x is split into four arrays \(a, i, f, o\) of the same shapes along the second axis. It means that x ‘s second axis must have 4 times the c_prev ‘s second axis.

The split input arrays are corresponding to:

  • \(a\) : sources of cell input

  • \(i\) : sources of input gate

  • \(f\) : sources of forget gate

  • \(o\) : sources of output gate

Second, it computes the updated cell state c and the outgoing signal h as:

\[\begin{split}c &= \tanh(a) \sigma(i) + c_{\text{prev}} \sigma(f), \\ h &= \tanh(c) \sigma(o),\end{split}\]

where \(\sigma\) is the elementwise sigmoid function. These are returned as a tuple of two variables.

This function supports variable length inputs. The mini-batch size of the current input must be equal to or smaller than that of the previous one. When mini-batch size of x is smaller than that of c, this function only updates c[0:len(x)] and doesn’t change the rest of c, c[len(x):]. So, please sort input sequences in descending order of lengths before applying the function.

Parameters
  • c_prev (Variable or N-dimensional array) – Variable that holds the previous cell state. The cell state should be a zero array or the output of the previous call of LSTM.

  • x (Variable or N-dimensional array) – Variable that holds the sources of cell input, input gate, forget gate and output gate. It must have the second dimension whose size is four times of that of the cell state.

Returns

Two Variable objects c and h. c is the updated cell state. h indicates the outgoing signal.

Return type

tuple

See the original paper proposing LSTM with forget gates: Long Short-Term Memory in Recurrent Neural Networks.

See also

LSTM

Example

Assuming y is the current incoming signal, c is the previous cell state, and h is the previous outgoing signal from an lstm function. Each of y, c and h has n_units channels. Most typical preparation of x is:

>>> n_units = 100
>>> y = chainer.Variable(np.zeros((1, n_units), np.float32))
>>> h = chainer.Variable(np.zeros((1, n_units), np.float32))
>>> c = chainer.Variable(np.zeros((1, n_units), np.float32))
>>> model = chainer.Chain()
>>> with model.init_scope():
...   model.w = L.Linear(n_units, 4 * n_units)
...   model.v = L.Linear(n_units, 4 * n_units)
>>> x = model.w(y) + model.v(h)
>>> c, h = F.lstm(c, x)

It corresponds to calculate the input array x, or the input sources \(a, i, f, o\), from the current incoming signal y and the previous outgoing signal h. Different parameters are used for different kind of input sources.

Note

We use the naming rule below.

  • incoming signal

    The formal input of the formulation of LSTM (e.g. in NLP, word vector or output of lower RNN layer). The input of chainer.links.LSTM is the incoming signal.

  • input array

    The array which is linear transformed from incoming signal and the previous outgoing signal. The input array contains four sources, the sources of cell input, input gate, forget gate and output gate. The input of chainer.functions.activation.lstm.LSTM is the input array.