# chainer.functions.n_step_gru¶

chainer.functions.n_step_gru(n_layers, dropout_ratio, hx, ws, bs, xs)[source]

Stacked Uni-directional Gated Recurrent Unit function.

This function calculates stacked Uni-directional GRU with sequences. This function gets an initial hidden state $$h_0$$, an input sequence $$x$$, weight matrices $$W$$, and bias vectors $$b$$. This function calculates hidden states $$h_t$$ for each time $$t$$ from input $$x_t$$.

$\begin{split}r_t &= \sigma(W_0 x_t + W_3 h_{t-1} + b_0 + b_3) \\ z_t &= \sigma(W_1 x_t + W_4 h_{t-1} + b_1 + b_4) \\ h'_t &= \tanh(W_2 x_t + b_2 + r_t \cdot (W_5 h_{t-1} + b_5)) \\ h_t &= (1 - z_t) \cdot h'_t + z_t \cdot h_{t-1}\end{split}$

As the function accepts a sequence, it calculates $$h_t$$ for all $$t$$ with one call. Six weight matrices and six bias vectors are required for each layers. So, when $$S$$ layers exists, you need to prepare $$6S$$ weight matrices and $$6S$$ bias vectors.

If the number of layers n_layers is greather than $$1$$, input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters
• n_layers (int) – Number of layers.

• dropout_ratio (float) – Dropout ratio.

• hx (Variable) – Variable holding stacked hidden states. Its shape is (S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimension of hidden units.

• ws (list of list of Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing six matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 3 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.

• bs (list of list of Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing six vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimension of hidden units.

• xs (list of Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0].

Returns

This function returns a tuple containing three elements, hy and ys.

• hy is an updated hidden states whose shape is same as hx.

• ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type

tuple