chainer.functions.n_step_bigru¶

chainer.functions.
n_step_bigru
(n_layers, dropout_ratio, hx, ws, bs, xs)[source]¶ Stacked Bidirectional Gated Recurrent Unit function.
This function calculates stacked Bidirectional GRU with sequences. This function gets an initial hidden state \(h_0\), an input sequence \(x\), weight matrices \(W\), and bias vectors \(b\). This function calculates hidden states \(h_t\) for each time \(t\) from input \(x_t\).
\[\begin{split}r^{f}_t &= \sigma(W^{f}_0 x_t + W^{f}_3 h_{t1} + b^{f}_0 + b^{f}_3) \\ z^{f}_t &= \sigma(W^{f}_1 x_t + W^{f}_4 h_{t1} + b^{f}_1 + b^{f}_4) \\ h^{f'}_t &= \tanh(W^{f}_2 x_t + b^{f}_2 + r^{f}_t \cdot (W^{f}_5 h_{t1} + b^{f}_5)) \\ h^{f}_t &= (1  z^{f}_t) \cdot h^{f'}_t + z^{f}_t \cdot h_{t1} \\ r^{b}_t &= \sigma(W^{b}_0 x_t + W^{b}_3 h_{t1} + b^{b}_0 + b^{b}_3) \\ z^{b}_t &= \sigma(W^{b}_1 x_t + W^{b}_4 h_{t1} + b^{b}_1 + b^{b}_4) \\ h^{b'}_t &= \tanh(W^{b}_2 x_t + b^{b}_2 + r^{b}_t \cdot (W^{b}_5 h_{t1} + b^{b}_5)) \\ h^{b}_t &= (1  z^{b}_t) \cdot h^{b'}_t + z^{b}_t \cdot h_{t1} \\ h_t &= [h^{f}_t; h^{b}_t] \\\end{split}\]where \(W^{f}\) is weight matrices for forwardGRU, \(W^{b}\) is weight matrices for backwardGRU.
As the function accepts a sequence, it calculates \(h_t\) for all \(t\) with one call. Six weight matrices and six bias vectors are required for each layers. So, when \(S\) layers exists, you need to prepare \(6S\) weight matrices and \(6S\) bias vectors.
If the number of layers
n_layers
is greather than \(1\), input ofk
th layer is hidden stateh_t
ofk1
th layer. Note that all input variables except first layer may have different shape from the first layer. Parameters
n_layers (int) – Number of layers.
dropout_ratio (float) – Dropout ratio.
hx (
Variable
) – Variable holding stacked hidden states. Its shape is(2S, B, N)
whereS
is number of layers and is equal ton_layers
,B
is minibatch size, andN
is dimension of hidden units.ws (list of list of
Variable
) – Weight matrices.ws[i]
represents weights for ith layer. Eachws[i]
is a list containing six matrices.ws[i][j]
is corresponding withW_j
in the equation. Onlyws[0][j]
where0 <= j < 3
is(N, I)
shape as they are multiplied with input variables. All other matrices has(N, N)
shape.bs (list of list of
Variable
) – Bias vectors.bs[i]
represnents biases for ith layer. Eachbs[i]
is a list containing six vectors.bs[i][j]
is corresponding withb_j
in the equation. Shape of each matrix is(N,)
whereN
is dimension of hidden units.xs (list of
Variable
) – A list ofVariable
holding input values. Each elementxs[t]
holds input value for timet
. Its shape is(B_t, I)
, whereB_t
is minibatch size for timet
, andI
is size of input units. Note that this function supports variable length sequences. When sequneces has different lengths, sort sequences in descending order by length, and transpose the sorted sequence.transpose_sequence()
transpose a list ofVariable()
holding sequence. Soxs
needs to satisfyxs[t].shape[0] >= xs[t + 1].shape[0]
.use_bi_direction (bool) – If
True
, this function uses Bidirection GRU.
 Returns
This function returns a tuple containing three elements,
hy
andys
.hy
is an updated hidden states whose shape is same ashx
.ys
is a list ofVariable
. Each elementys[t]
holds hidden states of the last layer corresponding to an inputxs[t]
. Its shape is(B_t, N)
whereB_t
is minibatch size for timet
, andN
is size of hidden units. Note thatB_t
is the same value asxs[t]
.
 Return type