# chainer.functions.n_step_lstm¶

chainer.functions.n_step_lstm(n_layers, dropout_ratio, hx, cx, ws, bs, xs)[source]

Stacked Uni-directional Long Short-Term Memory function.

This function calculates stacked Uni-directional LSTM with sequences. This function gets an initial hidden state $$h_0$$, an initial cell state $$c_0$$, an input sequence $$x$$, weight matrices $$W$$, and bias vectors $$b$$. This function calculates hidden states $$h_t$$ and $$c_t$$ for each time $$t$$ from input $$x_t$$.

$\begin{split}i_t &= \sigma(W_0 x_t + W_4 h_{t-1} + b_0 + b_4) \\ f_t &= \sigma(W_1 x_t + W_5 h_{t-1} + b_1 + b_5) \\ o_t &= \sigma(W_2 x_t + W_6 h_{t-1} + b_2 + b_6) \\ a_t &= \tanh(W_3 x_t + W_7 h_{t-1} + b_3 + b_7) \\ c_t &= f_t \cdot c_{t-1} + i_t \cdot a_t \\ h_t &= o_t \cdot \tanh(c_t)\end{split}$

As the function accepts a sequence, it calculates $$h_t$$ for all $$t$$ with one call. Eight weight matrices and eight bias vectors are required for each layer. So, when $$S$$ layers exist, you need to prepare $$8S$$ weight matrices and $$8S$$ bias vectors.

If the number of layers n_layers is greater than $$1$$, the input of the k-th layer is the hidden state h_t of the k-1-th layer. Note that all input variables except the first layer may have different shape from the first layer.

Parameters
• n_layers (int) – The number of layers.

• dropout_ratio (float) – Dropout ratio.

• hx (Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is the number of layers and is equal to n_layers, B is the mini-batch size, and N is the dimension of the hidden units.

• cx (Variable) – Variable holding stacked cell states. It has the same shape as hx.

• ws (list of list of Variable) – Weight matrices. ws[i] represents the weights for the i-th layer. Each ws[i] is a list containing eight matrices. ws[i][j] corresponds to $$W_j$$ in the equation. Only ws[0][j] where 0 <= j < 4 are (N, I)-shaped as they are multiplied with input variables, where I is the size of the input and N is the dimension of the hidden units. All other matrices are (N, N)-shaped.

• bs (list of list of Variable) – Bias vectors. bs[i] represents the biases for the i-th layer. Each bs[i] is a list containing eight vectors. bs[i][j] corresponds to $$b_j$$ in the equation. The shape of each matrix is (N,) where N is the dimension of the hidden units.

• xs (list of Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is the mini-batch size for time t. The sequences must be transposed. transpose_sequence() can be used to transpose a list of Variables each representing a sequence. When sequences has different lengths, they must be sorted in descending order of their lengths before transposing. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].

Returns

This function returns a tuple containing three elements, hy, cy and ys.

• hy is an updated hidden states whose shape is the same as hx.

• cy is an updated cell states whose shape is the same as cx.

• ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is the mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type

tuple

Note

The dimension of hidden units is limited to only one size N. If you want to use variable dimension of hidden units, please use chainer.functions.lstm.

Example

>>> batchs = [3, 2, 1]  # support variable length sequences
>>> in_size, out_size, n_layers = 3, 2, 2
>>> dropout_ratio = 0.0
>>> xs = [np.ones((b, in_size)).astype(np.float32) for b in batchs]
>>> [x.shape for x in xs]
[(3, 3), (2, 3), (1, 3)]
>>> h_shape = (n_layers, batchs[0], out_size)
>>> hx = np.ones(h_shape).astype(np.float32)
>>> cx = np.ones(h_shape).astype(np.float32)
>>> w_in = lambda i, j: in_size if i == 0 and j < 4 else out_size
>>> ws = []
>>> bs = []
>>> for n in range(n_layers):
...     ws.append([np.ones((out_size, w_in(n, i))).astype(np.float32) for i in range(8)])
...     bs.append([np.ones((out_size,)).astype(np.float32) for _ in range(8)])
...
>>> ws[0][0].shape  # ws[0][:4].shape are (out_size, in_size)
(2, 3)
>>> ws[1][0].shape  # others are (out_size, out_size)
(2, 2)
>>> bs[0][0].shape
(2,)
>>> hy, cy, ys = F.n_step_lstm(
...     n_layers, dropout_ratio, hx, cx, ws, bs, xs)
>>> hy.shape
(2, 3, 2)
>>> cy.shape
(2, 3, 2)
>>> [y.shape for y in ys]
[(3, 2), (2, 2), (1, 2)]