# chainer.functions.n_step_bigru¶

chainer.functions.n_step_bigru(n_layers, dropout_ratio, hx, ws, bs, xs)[source]

Stacked Bi-directional Gated Recurrent Unit function.

This function calculates stacked Bi-directional GRU with sequences. This function gets an initial hidden state $$h_0$$, an input sequence $$x$$, weight matrices $$W$$, and bias vectors $$b$$. This function calculates hidden states $$h_t$$ for each time $$t$$ from input $$x_t$$.

$\begin{split}r^{f}_t &= \sigma(W^{f}_0 x_t + W^{f}_3 h_{t-1} + b^{f}_0 + b^{f}_3) \\ z^{f}_t &= \sigma(W^{f}_1 x_t + W^{f}_4 h_{t-1} + b^{f}_1 + b^{f}_4) \\ h^{f'}_t &= \tanh(W^{f}_2 x_t + b^{f}_2 + r^{f}_t \cdot (W^{f}_5 h_{t-1} + b^{f}_5)) \\ h^{f}_t &= (1 - z^{f}_t) \cdot h^{f'}_t + z^{f}_t \cdot h_{t-1} \\ r^{b}_t &= \sigma(W^{b}_0 x_t + W^{b}_3 h_{t-1} + b^{b}_0 + b^{b}_3) \\ z^{b}_t &= \sigma(W^{b}_1 x_t + W^{b}_4 h_{t-1} + b^{b}_1 + b^{b}_4) \\ h^{b'}_t &= \tanh(W^{b}_2 x_t + b^{b}_2 + r^{b}_t \cdot (W^{b}_5 h_{t-1} + b^{b}_5)) \\ h^{b}_t &= (1 - z^{b}_t) \cdot h^{b'}_t + z^{b}_t \cdot h_{t-1} \\ h_t &= [h^{f}_t; h^{b}_t] \\\end{split}$

where $$W^{f}$$ is weight matrices for forward-GRU, $$W^{b}$$ is weight matrices for backward-GRU.

As the function accepts a sequence, it calculates $$h_t$$ for all $$t$$ with one call. Six weight matrices and six bias vectors are required for each layers. So, when $$S$$ layers exists, you need to prepare $$6S$$ weight matrices and $$6S$$ bias vectors.

If the number of layers n_layers is greather than $$1$$, input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Warning

train and use_cudnn arguments are not supported anymore since v2. Instead, use chainer.using_config('train', train) and chainer.using_config('use_cudnn', use_cudnn) respectively. See chainer.using_config().

Parameters: n_layers (int) – Number of layers. dropout_ratio (float) – Dropout ratio. hx (chainer.Variable) – Variable holding stacked hidden states. Its shape is (2S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimension of hidden units. ws (list of list of chainer.Variable) – Weight matrices. ws[i] represents weights for i-th layer. Each ws[i] is a list containing six matrices. ws[i][j] is corresponding with W_j in the equation. Only ws[0][j] where 0 <= j < 3 is (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape. bs (list of list of chainer.Variable) – Bias vectors. bs[i] represnents biases for i-th layer. Each bs[i] is a list containing six vectors. bs[i][j] is corresponding with b_j in the equation. Shape of each matrix is (N,) where N is dimension of hidden units. xs (list of chainer.Variable) – A list of Variable holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence. transpose_sequence() transpose a list of Variable() holding sequence. So xs needs to satisfy xs[t].shape[0] >= xs[t + 1].shape[0]. use_bi_direction (bool) – If True, this function uses Bi-direction GRU. This function returns a tuple containing three elements, hy and ys. hy is an updated hidden states whose shape is same as hx. ys is a list of Variable . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t]. tuple