# chainerx.n_step_birnn¶

chainerx.n_step_birnn(n_layers, hx, ws, bs, xs, activation='tanh')

Stacked Bi-directional RNN function for sequence inputs. This function calculates stacked Bi-directional RNN with sequences. This function gets an initial hidden state $$h_0$$, an initial cell state $$c_0$$, an input sequence $$x$$, weight matrices $$W$$, and bias vectors $$b$$. This function calculates hidden states $$h_t$$ and $$c_t$$ for each time $$t$$ from input $$x_t$$.

$\begin{split}h^{f}_t &=& f(W^{f}_0 x_t + W^{f}_1 h_{t-1} + b^{f}_0 + b^{f}_1), \\ h^{b}_t &=& f(W^{b}_0 x_t + W^{b}_1 h_{t-1} + b^{b}_0 + b^{b}_1), \\ h_t &=& [h^{f}_t; h^{f}_t], \\\end{split}$

where $$f$$ is an activation function. Weight matrices $$W$$ contains two matrices $$W^{f}$$ and $$W^{b}$$. $$W^{f}$$ is weight matrices for forward directional RNN. $$W^{b}$$ is weight matrices for backward directional RNN. $$W^{f}$$ contains $$W^{f}_0$$ for an input sequence and $$W^{f}_1$$ for a hidden state. $$W^{b}$$ contains $$W^{b}_0$$ for an input sequence and $$W^{b}_1$$ for a hidden state. Bias matrices $$b$$ contains two matrices $$b^{f}$$ and $$b^{f}$$. $$b^{f}$$ contains $$b^{f}_0$$ for an input sequence and $$b^{f}_1$$ for a hidden state. $$b^{b}$$ contains $$b^{b}_0$$ for an input sequence and $$b^{b}_1$$ for a hidden state. As the function accepts a sequence, it calculates $$h_t$$ for all $$t$$ with one call. Two weight matrices and two bias vectors are required for each layer. So, when $$S$$ layers exist, you need to prepare $$2S$$ weight matrices and $$2S$$ bias vectors. If the number of layers n_layers is greather than $$1$$, input of k-th layer is hidden state h_t of k-1-th layer. Note that all input variables except first layer may have different shape from the first layer.

Parameters
• n_layers (int) – Number of layers.

• hx (array) – Variable holding stacked hidden states. Its shape is (2S, B, N) where S is number of layers and is equal to n_layers, B is mini-batch size, and N is dimension of hidden units. Because of bi-direction, the first dimension length is 2S.

• ws (list of list of array) – Weight matrices. ws[i + di] represents weights for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each ws[i + di] is a list containing two matrices. ws[i + di][j] is corresponding with W^{f}_j if di = 0 and corresponding with W^{b}_j if di = 1 in the equation. Only ws[j] and ws[j] where 0 <= j < 1 are (I, N) shape as they are multiplied with input variables. All other matrices has (N, N) shape.

• bs (list of list of array) – Bias vectors. bs[i + di] represnents biases for i-th layer. Note that di = 0 for forward-RNN and di = 1 for backward-RNN. Each bs[i + di] is a list containing two vectors. bs[i + di][j] is corresponding with b^{f}_j if di = 0 and corresponding with b^{b}_j if di = 1 in the equation. Shape of each matrix is (N,) where N is dimension of hidden units.

• xs (list of array) – A list of array holding input values. Each element xs[t] holds input value for time t. Its shape is (B_t, I), where B_t is mini-batch size for time t, and I is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length. So xs needs to satisfy xs[t].shape >= xs[t + 1].shape.

• activation (str) – Activation function name. Please select tanh or relu.

Returns

This function returns a tuple containing two elements, hy and ys.

• hy is an updated hidden states whose shape is same as hx.

• ys is a list of array . Each element ys[t] holds hidden states of the last layer corresponding to an input xs[t]. Its shape is (B_t, N) where B_t is mini-batch size for time t, and N is size of hidden units. Note that B_t is the same value as xs[t].

Return type

tuple