Standard Link implementations¶

Chainer provides many Link implementations in the chainer.links package.

Note

Some of the links are originally defined in the chainer.functions namespace. They are still left in the namespace for backward compatibility, though it is strongly recommended to use them via the chainer.links package.

Learnable connections¶

Bias¶

class chainer.links.Bias(axis=1, shape=None)[source]¶

Broadcasted elementwise summation with learnable parameters.

Computes a elementwise summation as bias() function does except that its second input is a learnable bias parameter $b$ the link has.

Parameters:	axis (int) – The first axis of the first input of `bias()` function along which its second input is applied. shape (tuple of ints) – Shape of the learnable bias parameter. If `None`, this link does not have learnable parameters so an explicit bias needs to be given to its `__call__` method’s second input.

See also

See bias() for details.

Variables:	b (Variable) – Bias parameter if `shape` is given. Otherwise, no attributes.

__call__(*xs)[source]¶

Applies broadcasted elementwise summation.

Parameters:	xs (list of Variables) – Input variables whose length should be one if the link has a learnable bias parameter, otherwise should be two.

Bilinear¶

class chainer.links.Bilinear(left_size, right_size, out_size, nobias=False, initialW=None, initial_bias=None)[source]¶

Bilinear layer that performs tensor multiplication.

Bilinear is a primitive link that wraps the bilinear() functions. It holds parameters W, V1, V2, and b corresponding to the arguments of bilinear().

Parameters:

left_size (int) – Dimension of input vector $e^1$ ($J$)
right_size (int) – Dimension of input vector $e^2$ ($K$)
out_size (int) – Dimension of output vector $y$ ($L$)
nobias (bool) – If True, parameters V1, V2, and b are omitted.
initialW (3-D numpy array) – Initial value of $W$. Shape of this argument must be (left_size, right_size, out_size). If None, $W$ is initialized by centered Gaussian distribution properly scaled according to the dimension of inputs and outputs. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (tuple) – Initial values of $V^1$, $V^2$ and $b$. The length this argument must be 3. Each element of this tuple must have the shapes of (left_size, output_size), (right_size, output_size), and (output_size,), respectively. If None, $V^1$ and $V^2$ is initialized by scaled centered Gaussian distributions and $b$ is set to $0$. May also be a tuple of callables that take numpy.ndarray or cupy.ndarray and edit its value.

See also

See chainer.functions.bilinear() for details.

Variables:	W (Variable) – Bilinear weight parameter. V1 (Variable) – Linear weight parameter for the first argument. V2 (Variable) – Linear weight parameter for the second argument. b (Variable) – Bias parameter.

__call__(e1, e2)[source]¶

Applies the bilinear function to inputs and the internal parameters.

Parameters:	e1 (Variable) – Left input. e2 (Variable) – Right input.
Returns:	Output variable.
Return type:	Variable

Convolution2D¶

class chainer.links.Convolution2D(in_channels, out_channels, ksize, stride=1, pad=0, wscale=1, bias=0, nobias=False, use_cudnn=True, initialW=None, initial_bias=None, deterministic=False)[source]¶

Two-dimensional convolutional layer.

This link wraps the convolution_2d() function and holds the filter weight and bias vector as parameters.

Parameters:

in_channels (int or None) – Number of channels of input arrays. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
out_channels (int) – Number of channels of output arrays.
ksize (int or pair of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k) are equivalent.
stride (int or pair of ints) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.
pad (int or pair of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.
wscale (float) – Scaling factor of the initial weight.
bias (float) – Initial bias value.
nobias (bool) – If True, then this link does not use the bias term.
use_cudnn (bool) – If True, then this link uses cuDNN if available.
initialW (4-D array) – Initial weight value. If None, then this function uses Gaussian distribution scaled by w_scale to initialize weight. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (1-D array) – Initial bias value. If None, then this function uses bias to initialize bias. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
deterministic (bool) – The output of this link can be non-deterministic when it uses cuDNN. If this option is True, then it forces cuDNN to use a deterministic algorithm. This option is only available for cuDNN version >= v4.

See also

See chainer.functions.convolution_2d() for the definition of two-dimensional convolution.

Variables:	W (Variable) – Weight parameter. b (Variable) – Bias parameter.

__call__(x)[source]¶

Applies the convolution layer.

Parameters:	x (Variable) – Input image.
Returns:	Output of the convolution.
Return type:	Variable

ConvolutionND¶

class chainer.links.ConvolutionND(ndim, in_channels, out_channels, ksize, stride=1, pad=0, initialW=None, initial_bias=None, use_cudnn=True, cover_all=False)[source]¶

N-dimensional convolution layer.

This link wraps the convolution_nd() function and holds the filter weight and bias vector as parameters.

Parameters:

ndim (int) – Number of spatial dimensions.
in_channels (int) – Number of channels of input arrays.
out_channels (int) – Number of channels of output arrays.
ksize (int or tuple of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k, ..., k) are equivalent.
stride (int or tuple of ints) – Stride of filter application. stride=s and stride=(s, s, ..., s) are equivalent.
pad (int or tuple of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p, ..., p) are equivalent.
initialW – Value used to initialize the filter weight. May be an initializer instance or another value that init_weight() helper function can take.
initial_bias – Value used to initialize the bias vector. May be an initializer instance or another value except None that init_weight() helper function can take. If None is given, this link does not use the bias vector.
use_cudnn (bool) – If True, then this link uses cuDNN if available. See convolution_nd() for exact conditions of cuDNN availability.
cover_all (bool) – If True, all spatial locations are convoluted into some output pixels. It may make the output size larger. cover_all needs to be False if you want to use cuDNN.

See also

See convolution_nd() for the definition of N-dimensional convolution. See convolution_2d() for the definition of two-dimensional convolution.

Variables:	W (Variable) – Weight parameter. b (Variable) – Bias parameter. If `initial_bias` is `None`, set to `None`.

__call__(x)[source]¶

Applies N-dimensional convolution layer.

Parameters:	x (Variable) – Input image.
Returns:	Output of convolution.
Return type:	Variable

Deconvolution2D¶

class chainer.links.Deconvolution2D(in_channels, out_channels, ksize, stride=1, pad=0, wscale=1, bias=0, nobias=False, outsize=None, use_cudnn=True, initialW=None, initial_bias=None, deterministic=False)[source]¶

Two dimensional deconvolution function.

This link wraps the deconvolution_2d() function and holds the filter weight and bias vector as parameters.

Parameters:

in_channels (int or None) – Number of channels of input arrays. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
out_channels (int) – Number of channels of output arrays.
ksize (int or pair of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k) are equivalent.
stride (int or pair of ints) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.
pad (int or pair of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.
wscale (float) – Scaling factor of the initial weight.
bias (float) – Initial bias value.
nobias (bool) – If True, then this function does not use the bias term.
outsize (tuple) – Expected output size of deconvolutional operation. It should be pair of height and width $(out_H, out_W)$. Default value is None and the outsize is estimated by input size, stride and pad.
use_cudnn (bool) – If True, then this function uses cuDNN if available.
initialW (4-D array) – Initial weight value. If None, then this function uses Gaussian distribution scaled by w_scale to initialize weight. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (1-D array) – Initial bias value. If None, then this function uses bias to initialize bias. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
deterministic (bool) – The output of this link can be non-deterministic when it uses cuDNN. If this option is True, then it forces cuDNN to use a deterministic algorithm. This option is only available for cuDNN version >= v4.

The filter weight has four dimensions $(c_I, c_O, k_H, k_W)$ which indicate the number of input channels, output channels, height and width of the kernels, respectively. The filter weight is initialized with i.i.d. Gaussian random samples, each of which has zero mean and deviation $\sqrt{1/(c_I k_H k_W)}$ by default. The deviation is scaled by wscale if specified.

The bias vector is of size $c_O$. Its elements are initialized by bias argument. If nobias argument is set to True, then this function does not hold the bias parameter.

See also

See chainer.functions.deconvolution_2d() for the definition of two-dimensional convolution.

DeconvolutionND¶

class chainer.links.DeconvolutionND(ndim, in_channels, out_channels, ksize, stride=1, pad=0, outsize=None, initialW=None, initial_bias=0, use_cudnn=True)[source]¶

N-dimensional deconvolution function.

This link wraps deconvolution_nd() function and holds the filter weight and bias vector as its parameters.

Parameters:

ndim (int) – Number of spatial dimensions.
in_channels (int) – Number of channels of input arrays.
out_channels (int) – Number of channels of output arrays.
ksize (int or tuple of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k, ..., k) are equivalent.
stride (int or tuple of ints) – Stride of filter application. stride=s and stride=(s, s, ..., s) are equivalent.
pad (int or tuple of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p, ..., p) are equivalent.
outsize (tuple of ints) – Expected output size of deconvolutional operation. It should be a tuple of ints that represents the output size of each dimension. Default value is None and the outsize is estimated with input size, stride and pad.
initialW – Value used to initialize the filter weight. May be an initializer instance of another value the same with that init_weight() function can take.
initial_bias – Value used to initialize the bias vector. May be an initializer instance or another value except None the same with that init_weight() function can take. If None is supplied, this link does not use the bias vector.
use_cudnn (bool) – If True, then this link uses cuDNN if available.

See also

deconvolution_nd()

Variables:	W (Variable) – Weight parameter. b (Variable) – Bias parameter. If `initial_bias` is `None`, set to `None`.

DepthwiseConvolution2D¶

class chainer.links.DepthwiseConvolution2D(in_channels, channel_multiplier, ksize, stride=1, pad=0, nobias=False, initialW=None, initial_bias=None)[source]¶

Two-dimensional depthwise convolutional layer.

This link wraps the depthwise_convolution_2d() function and holds the filter weight and bias vector as parameters.

Parameters:

in_channels (int) – Number of channels of input arrays. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
channel_multiplier (int) – Channel multiplier number. Number of output arrays equal in_channels * channel_multiplier.
ksize (int or pair of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k) are equivalent.
stride (int or pair of ints) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.
pad (int or pair of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.
nobias (bool) – If True, then this link does not use the bias term.
initialW (4-D array) – Initial weight value. If None, the default initializer is used. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (1-D array) – Initial bias value. If None, the bias is set to 0. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.

Variables:	W (Variable) – Weight parameter. b (Variable) – Bias parameter.

DilatedConvolution2D¶

class chainer.links.DilatedConvolution2D(in_channels, out_channels, ksize, stride=1, pad=0, dilate=1, wscale=1, bias=0, nobias=False, use_cudnn=True, initialW=None, initial_bias=None)[source]¶

Two-dimensional dilated convolutional layer.

This link wraps the dilated_convolution_2d() function and holds the filter weight and bias vector as parameters.

Parameters:

in_channels (int or None) – Number of channels of input arrays. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
out_channels (int) – Number of channels of output arrays.
ksize (int or pair of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k) are equivalent.
stride (int or pair of ints) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.
pad (int or pair of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.
dilate (int or pair of ints) – Dilation factor of filter applications. dilate=d and dilate=(d, d) are equivalent.
wscale (float) – Scaling factor of the initial weight.
bias (float) – Initial bias value.
nobias (bool) – If True, then this link does not use the bias term.
use_cudnn (bool) – If True, then this link uses cuDNN if available.
initialW (4-D array) – Initial weight value. If None, then this function uses scaled Gaussian distribution to initialize weight. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (1-D array) – Initial bias value. If None, then this function uses bias to initialize bias. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.

See also

See chainer.functions.dilated_convolution_2d() for the definition of two-dimensional dilated convolution.

Variables:	W (Variable) – Weight parameter. b (Variable) – Bias parameter.

__call__(x)[source]¶

Applies the convolution layer.

Parameters:	x (Variable) – Input image.
Returns:	Output of the convolution.
Return type:	Variable

EmbedID¶

class chainer.links.EmbedID(in_size, out_size, initialW=None, ignore_label=None)[source]¶

Efficient linear layer for one-hot input.

This is a link that wraps the embed_id() function. This link holds the ID (word) embedding matrix W as a parameter.

Parameters:

in_size (int) – Number of different identifiers (a.k.a. vocabulary size).
out_size (int) – Size of embedding vector.
initialW (2-D array) – Initial weight value. If None, then the matrix is initialized from the standard normal distribution. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
ignore_label (int or None) – If ignore_label is an int value, i-th column of return value is filled with 0.

See also

chainer.functions.embed_id()

Variables:	W (Variable) – Embedding parameter matrix.

__call__(x)[source]¶

Extracts the word embedding of given IDs.

Parameters:	x (Variable) – Batch vectors of IDs.
Returns:	Batch of corresponding embeddings.
Return type:	Variable

GRU¶

class chainer.links.GRU(n_units, n_inputs=None, init=None, inner_init=None, bias_init=0)[source]¶

Stateless Gated Recurrent Unit function (GRU).

GRU function has six parameters $W_r$, $W_z$, $W$, $U_r$, $U_z$, and $U$. All these parameters are $n \times n$ matrices, where $n$ is the dimension of hidden vectors.

Given two inputs a previous hidden vector $h$ and an input vector $x$, GRU returns the next hidden vector $h'$ defined as

\[\begin{split}r &=& \sigma(W_r x + U_r h), \\ z &=& \sigma(W_z x + U_z h), \\ \bar{h} &=& \tanh(W x + U (r \odot h)), \\ h' &=& (1 - z) \odot h + z \odot \bar{h},\end{split}\]

where $\sigma$ is the sigmoid function, and $\odot$ is the element-wise product.

GRU does not hold the value of hidden vector $h$. So this is stateless. Use StatefulGRU as a stateful GRU.

Parameters:	n_units (int) – Dimension of hidden vector $h$. n_inputs (int) – Dimension of input vector $x$. If `None`, it is set to the same value as `n_units`.

See:

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches [Cho+, SSST2014].
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling [Chung+NIPS2014 DLWorkshop].

See also

StatefulGRU

Highway¶

class chainer.links.Highway(in_out_size, nobias=False, activate=<function relu>, init_Wh=None, init_Wt=None, init_bh=None, init_bt=-1)[source]¶

Highway module.

In highway network, two gates are added to the ordinal non-linear transformation ($H(x) = activate(W_h x + b_h)$). One gate is the transform gate $T(x) = \sigma(W_t x + b_t)$, and the other is the carry gate $C(x)$. For simplicity, the author defined $C = 1 - T$. Highway module returns $y$ defined as

\[y = activate(W_h x + b_h) \odot \sigma(W_t x + b_t) + x \odot(1 - \sigma(W_t x + b_t))\]

The output array has the same spatial size as the input. In order to satisfy this, $W_h$ and $W_t$ must be square matrices.

Parameters:

in_out_size (int) – Dimension of input and output vectors.
nobias (bool) – If True, then this function does not use the bias.
activate – Activation function of plain array. $tanh$ is also available.
init_Wh (2-D array) – Initial weight value of plain array. If None, then this function uses Gaussian distribution scaled by w_scale to initialize $W_h$. May also be a callable that takes numpy.ndarray or``cupy.ndarray`` and edits its value.
init_bh (1-D array) – Initial bias value of plain array. If None, then this function uses zero vector to initialize $b_h$. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
init_Wt (2-D array) – Initial weight value of transform array. If None, then this function uses Gaussian distribution scaled by w_scale to initialize $W_t$. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
init_bt (1-D array) – Initial bias value of transform array. Default value is -1 vector. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value. Negative value is recommended by the author of the paper. (e.g. -1, -3, ...).

See:: Highway Networks.

__call__(x)[source]¶

Computes the output of the Highway module.

Parameters:	x (Variable) – Input variable.
Returns:	Output variable. Its array has the same spatial size and the same minibatch size as the input array.
Return type:	Variable

Inception¶

class chainer.links.Inception(in_channels, out1, proj3, out3, proj5, out5, proj_pool, conv_init=None, bias_init=None)[source]¶

Inception module of GoogLeNet.

It applies four different functions to the input array and concatenates their outputs along the channel dimension. Three of them are 2D convolutions of sizes 1x1, 3x3 and 5x5. Convolution paths of 3x3 and 5x5 sizes have 1x1 convolutions (called projections) ahead of them. The other path consists of 1x1 convolution (projection) and 3x3 max pooling.

The output array has the same spatial size as the input. In order to satisfy this, Inception module uses appropriate padding for each convolution and pooling.

See: Going Deeper with Convolutions.

Parameters:

in_channels (int) – Number of channels of input arrays.
out1 (int) – Output size of 1x1 convolution path.
proj3 (int) – Projection size of 3x3 convolution path.
out3 (int) – Output size of 3x3 convolution path.
proj5 (int) – Projection size of 5x5 convolution path.
out5 (int) – Output size of 5x5 convolution path.
proj_pool (int) – Projection size of max pooling path.
conv_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the convolution matrix weights. Maybe be None to use default initialization.
bias_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the convolution bias weights. Maybe be None to use default initialization.

__call__(x)[source]¶

Computes the output of the Inception module.

Parameters:	x (Variable) – Input variable.
Returns:	Output variable. Its array has the same spatial size and the same minibatch size as the input array. The channel dimension has size `out1 + out3 + out5 + proj_pool`.
Return type:	Variable

InceptionBN¶

class chainer.links.InceptionBN(in_channels, out1, proj3, out3, proj33, out33, pooltype, proj_pool=None, stride=1, conv_init=None, dtype=<type 'numpy.float32'>)[source]¶

Inception module of the new GoogLeNet with BatchNormalization.

This chain acts like Inception, while InceptionBN uses the BatchNormalization on top of each convolution, the 5x5 convolution path is replaced by two consecutive 3x3 convolution applications, and the pooling method is configurable.

See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.

Parameters:

in_channels (int) – Number of channels of input arrays.
out1 (int) – Output size of the 1x1 convolution path.
proj3 (int) – Projection size of the single 3x3 convolution path.
out3 (int) – Output size of the single 3x3 convolution path.
proj33 (int) – Projection size of the double 3x3 convolutions path.
out33 (int) – Output size of the double 3x3 convolutions path.
pooltype (str) – Pooling type. It must be either 'max' or 'avg'.
proj_pool (bool) – If True, do projection in the pooling path.
stride (int) – Stride parameter of the last convolution of each path.
conv_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the convolution matrix weights. Maybe be None to use default initialization.
dtype (numpy.dtype) – Type to use in ~batch_normalization.BatchNormalization.

See also

Inception

Variables:	train (bool) – If `True`, then batch normalization layers are used in training mode. If `False`, they are used in testing mode.

__call__(x, test=None)[source]¶

Computes the output of the InceptionBN module.

Parameters:	x (Variable) – An input variable. test (bool) – If `True`, batch normalization layers run in testing mode; if `test` is omitted, `not self.train` is used as `test`.

Linear¶

class chainer.links.Linear(in_size, out_size, wscale=1, bias=0, nobias=False, initialW=None, initial_bias=None)[source]¶

Linear layer (a.k.a. fully-connected layer).

This is a link that wraps the linear() function, and holds a weight matrix W and optionally a bias vector b as parameters.

The weight matrix W is initialized with i.i.d. Gaussian samples, each of which has zero mean and deviation $\sqrt{1/\text{in_size}}$. The bias vector b is of size out_size. Each element is initialized with the bias value. If nobias argument is set to True, then this link does not hold a bias vector.

Parameters:

in_size (int or None) – Dimension of input vectors. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
out_size (int) – Dimension of output vectors.
wscale (float) – Scaling factor of the weight matrix.
bias (float) – Initial bias value.
nobias (bool) – If True, then this function does not use the bias.
initialW (2-D array) – Initial weight value. If None, then this function uses Gaussian distribution scaled by w_scale to initialize weight. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (1-D array) – Initial bias value. If None, then this function uses bias to initialize bias. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.

See also

linear()

Variables:	W (Variable) – Weight parameter. b (Variable) – Bias parameter.

__call__(x)[source]¶

Applies the linear layer.

Parameters:	x (Variable) – Batch of input vectors.
Returns:	Output of the linear layer.
Return type:	Variable

LSTM¶

class chainer.links.LSTM(in_size, out_size, **kwargs)[source]¶

Fully-connected LSTM layer.

This is a fully-connected LSTM layer as a chain. Unlike the lstm() function, which is defined as a stateless activation function, this chain holds upward and lateral connections as child links.

It also maintains states, including the cell state and the output at the previous time step. Therefore, it can be used as a stateful LSTM.

This link supports variable length inputs. The mini-batch size of the current input must be equal to or smaller than that of the previous one. The mini-batch size of c and h is determined as that of the first input x. When mini-batch size of i-th input is smaller than that of the previous input, this link only updates c[0:len(x)] and h[0:len(x)] and doesn’t change the rest of c and h. So, please sort input sequences in descending order of lengths before applying the function.

Parameters:

in_size (int) – Dimension of input vectors. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
out_size (int) – Dimensionality of output vectors.
lateral_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the lateral connections. Maybe be None to use default initialization.
upward_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the upward connections. Maybe be None to use default initialization.
bias_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value It is used for initialization of the biases of cell input, input gate and output gate.and gates of the upward connection. Maybe a scalar, in that case, the bias is initialized by this value. Maybe be None to use default initialization.
forget_bias_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value It is used for initialization of the biases of the forget gate of the upward connection. Maybe a scalar, in that case, the bias is initialized by this value. Maybe be None to use default initialization.

Variables:

upward (Linear) – Linear layer of upward connections.
lateral (Linear) – Linear layer of lateral connections.
c (Variable) – Cell states of LSTM units.
h (Variable) – Output at the previous time step.

__call__(x)[source]¶

Updates the internal state and returns the LSTM outputs.

Parameters:	x (Variable) – A new batch from the input sequence.
Returns:	Outputs of updated LSTM units.
Return type:	Variable

reset_state()[source]¶

Resets the internal state.

It sets None to the c and h attributes.

set_state(c, h)[source]¶

Sets the internal state.

It sets the c and h attributes.

Parameters:	c (Variable) – A new cell states of LSTM units. h (Variable) – A new output at the previous time step.

MLPConvolution2D¶

NStepBiLSTM¶

class chainer.links.NStepBiLSTM(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Bi-directional LSTM for sequnces.

This link is stacked version of Bi-directional LSTM for sequences. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_bilstm(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_bilstm()

NStepBiRNNReLU¶

class chainer.links.NStepBiRNNReLU(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Bi-directional RNN for sequnces.

This link is stacked version of Bi-directional RNN for sequences. Note that the activation function is relu. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_birnn(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_birnn()

NStepBiRNNTanh¶

class chainer.links.NStepBiRNNTanh(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Bi-directional RNN for sequnces.

This link is stacked version of Bi-directional RNN for sequences. Note that the activation function is tanh. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_birnn(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_birnn()

NStepGRU¶

class chainer.links.NStepGRU(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Uni-directional GRU for sequnces.

This link is stacked version of Uni-directional GRU for sequences. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_gru(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_gru()

NStepLSTM¶

class chainer.links.NStepLSTM(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Uni-directional LSTM for sequnces.

This link is stacked version of Uni-directional LSTM for sequences. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_lstm(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_lstm()

NStepRNNReLU¶

class chainer.links.NStepRNNReLU(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Uni-directional RNN for sequnces.

This link is stacked version of Uni-directional RNN for sequences. Note that the activation function is relu. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_rnn(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_rnn()

NStepRNNTanh¶

class chainer.links.NStepRNNTanh(n_layers, in_size, out_size, dropout, use_cudnn=True)[source]¶

Stacked Uni-directional RNN for sequnces.

This link is stacked version of Uni-directional RNN for sequences. Note that the activation function is tanh. It calculates hidden and cell states of all layer at end-of-string, and all hidden states of the last layer for each time.

Unlike chainer.functions.n_step_rnn(), this function automatically sort inputs in descending order by length, and transpose the seuqnece. Users just need to call the link with a list of chainer.Variable holding sequences.

Parameters:	n_layers (int) – Number of layers. in_size (int) – Dimensionality of input vectors. out_size (int) – Dimensionality of hidden states and output vectors. dropout (float) – Dropout ratio. use_cudnn (bool) – Use cuDNN.

See also

chainer.functions.n_step_rnn()

Scale¶

class chainer.links.Scale(axis=1, W_shape=None, bias_term=False, bias_shape=None)[source]¶

Broadcasted elementwise product with learnable parameters.

Computes a elementwise product as scale() function does except that its second input is a learnable weight parameter $W$ the link has.

Parameters:

axis (int) – The first axis of the first input of scale() function along which its second input is applied.
W_shape (tuple of ints) – Shape of learnable weight parameter. If None, this link does not have learnable weight parameter so an explicit weight needs to be given to its __call__ method’s second input.
bias_term (bool) – Whether to also learn a bias (equivalent to Scale link + Bias link).
bias_shape (tuple of ints) – Shape of learnable bias. If W_shape is None, this should be given to determine the shape. Otherwise, the bias has the same shape W_shape with the weight parameter and bias_shape is ignored.

See also

See scale() for details.

Variables:	W (Variable) – Weight parameter if `W_shape` is given. Otherwise, no W attribute. bias (Bias) – Bias term if `bias_term` is `True`. Otherwise, no bias attribute.

__call__(*xs)[source]¶

Applies broadcasted elementwise product.

Parameters:	xs (list of Variables) – Input variables whose length should be one if the link has a learnable weight parameter, otherwise should be two.

StatefulGRU¶

class chainer.links.StatefulGRU(in_size, out_size, init=None, inner_init=None, bias_init=0)[source]¶

Stateful Gated Recurrent Unit function (GRU).

Stateful GRU function has six parameters $W_r$, $W_z$, $W$, $U_r$, $U_z$, and $U$. All these parameters are $n \times n$ matrices, where $n$ is the dimension of hidden vectors.

Given input vector $x$, Stateful GRU returns the next hidden vector $h'$ defined as

\[\begin{split}r &=& \sigma(W_r x + U_r h), \\ z &=& \sigma(W_z x + U_z h), \\ \bar{h} &=& \tanh(W x + U (r \odot h)), \\ h' &=& (1 - z) \odot h + z \odot \bar{h},\end{split}\]

where $h$ is current hidden vector.

As the name indicates, StatefulGRU is stateful, meaning that it also holds the next hidden vector h’ as a state. Use GRU as a stateless version of GRU.

Parameters:

in_size (int) – Dimension of input vector $x$.
out_size (int) – Dimension of hidden vector $h$.
init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the GRU’s input units ($W$). Maybe be None to use default initialization.
inner_init – A callable that takes numpy.ndarray or cupy.ndarray and edits its value. It is used for initialization of the GRU’s inner recurrent units ($U$). Maybe be None to use default initialization.
bias_init – A callable or scalar used to initialize the bias values for both the GRU’s inner and input units. Maybe be None to use default initialization.

Variables:

h (Variable) – Hidden vector that indicates the state of StatefulGRU.

See also

GRU

StatefulPeepholeLSTM¶

class chainer.links.StatefulPeepholeLSTM(in_size, out_size)[source]¶

Fully-connected LSTM layer with peephole connections.

This is a fully-connected LSTM layer with peephole connections as a chain. Unlike the LSTM link, this chain holds peep_i, peep_f and peep_o as child links besides upward and lateral.

Given a input vector $x$, Peephole returns the next hidden vector $h'$ defined as

\[\begin{split}a &=& \tanh(upward x + lateral h), \\ i &=& \sigma(upward x + lateral h + peep_i c), \\ f &=& \sigma(upward x + lateral h + peep_f c), \\ c' &=& a \odot i + f \odot c, \\ o &=& \sigma(upward x + lateral h + peep_o c'), \\ h' &=& o \tanh(c'),\end{split}\]

where $\sigma$ is the sigmoid function, $\odot$ is the element-wise product, $c$ is the current cell state, $c'$ is the next cell state and $h$ is the current hidden vector.

Parameters:

in_size (int) – Dimension of the input vector $x$.
out_size (int) – Dimension of the hidden vector $h$.

Variables:

upward (Linear) – Linear layer of upward connections.
lateral (Linear) – Linear layer of lateral connections.
peep_i (Linear) – Linear layer of peephole connections to the input gate.
peep_f (Linear) – Linear layer of peephole connections to the forget gate.
peep_o (Linear) – Linear layer of peephole connections to the output gate.
c (Variable) – Cell states of LSTM units.
h (Variable) – Output at the current time step.

__call__(x)[source]¶

Updates the internal state and returns the LSTM outputs.

Parameters:	x (Variable) – A new batch from the input sequence.
Returns:	Outputs of updated LSTM units.
Return type:	Variable

reset_state()[source]¶

Resets the internal states.

It sets None to the c and h attributes.

StatelessLSTM¶

class chainer.links.StatelessLSTM(in_size, out_size, lateral_init=None, upward_init=None, bias_init=0, forget_bias_init=0)[source]¶

Stateless LSTM layer.

This is a fully-connected LSTM layer as a chain. Unlike the lstm() function, this chain holds upward and lateral connections as child links. This link doesn’t keep cell and hidden states.

Parameters:	in_size (int or None) – Dimension of input vectors. If `None`, parameter initialization will be deferred until the first forward data pass at which time the size will be determined. out_size (int) – Dimensionality of output vectors.
Variables:	upward (chainer.links.Linear) – Linear layer of upward connections. lateral (chainer.links.Linear) – Linear layer of lateral connections.

__call__(c, h, x)[source]¶

Returns new cell state and updated output of LSTM.

Parameters:

c (Variable) – Cell states of LSTM units.
h (Variable) – Output at the previous time step.
x (Variable) – A new batch from the input sequence.

Returns:

Returns (c_new, h_new), where: c_new represents new cell state, and h_new is updated output of LSTM units.

Return type:

tuple of ~chainer.Variable

Activation/loss/normalization functions with parameters¶

BatchNormalization¶

class chainer.links.BatchNormalization(size, decay=0.9, eps=2e-05, dtype=<type 'numpy.float32'>, use_gamma=True, use_beta=True, initial_gamma=None, initial_beta=None, use_cudnn=True)[source]¶

Batch normalization layer on outputs of linear or convolution functions.

This link wraps the batch_normalization() and fixed_batch_normalization() functions.

It runs in three modes: training mode, fine-tuning mode, and testing mode.

In training mode, it normalizes the input by batch statistics. It also maintains approximated population statistics by moving averages, which can be used for instant evaluation in testing mode.

In fine-tuning mode, it accumulates the input to compute population statistics. In order to correctly compute the population statistics, a user must use this mode to feed mini-batches running through whole training dataset.

In testing mode, it uses pre-computed population statistics to normalize the input variable. The population statistics is approximated if it is computed by training mode, or accurate if it is correctly computed by fine-tuning mode.

Parameters:

size (int or tuple of ints) – Size (or shape) of channel dimensions.
decay (float) – Decay rate of moving average. It is used on training.
eps (float) – Epsilon value for numerical stability.
dtype (numpy.dtype) – Type to use in computing.
use_gamma (bool) – If True, use scaling parameter. Otherwise, use unit(1) which makes no effect.
use_beta (bool) – If True, use shifting parameter. Otherwise, use unit(0) which makes no effect.
use_cudnn (bool) – If True, then this link uses cuDNN if available.

See: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Variables:

gamma (Variable) – Scaling parameter.
beta (Variable) – Shifting parameter.
avg_mean (Variable) – Population mean.
avg_var (Variable) – Population variance.
N (int) – Count of batches given for fine-tuning.
decay (float) – Decay rate of moving average. It is used on training.
eps (float) – Epsilon value for numerical stability. This value is added to the batch variances.
use_cudnn (bool) – If True, then this link uses cuDNN if available.

__call__(x, test=False, finetune=False)[source]¶

Invokes the forward propagation of BatchNormalization.

BatchNormalization accepts additional arguments, which controls three different running mode.

Parameters:

x (Variable) – Input variable.
test (bool) – If True, BatchNormalization runs in testing mode; it normalizes the input using pre-computed statistics.
finetune (bool) – If finetune is True and test is False, BatchNormalization runs in fine-tuning mode; it accumulates the input array to compute population statistics for normalization, and normalizes the input using batch statistics.

If test is False, then BatchNormalization runs in training mode; it computes moving averages of mean and variance for evaluation during training, and normalizes the input using batch statistics.

start_finetuning()[source]¶

Resets the population count for collecting population statistics.

This method can be skipped if it is the first time to use the fine-tuning mode. Otherwise, this method should be called before starting the fine-tuning mode again.

LayerNormalization¶

class chainer.links.LayerNormalization(size=None, eps=1e-06, initial_gamma=None, initial_beta=None)[source]¶

Layer normalization layer on outputs of linear functions.

This link implements a “layer normalization” layer which normalizes the input units by statistics that are computed along the second axis, scales and shifts them. Parameter initialization will be deferred until the first forward data pass at which time the size will be determined.

Parameters:

size (int) – Size of input units. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
eps (float) – Epsilon value for numerical stability of normalization.
initial_gamma (Initializer) – Initializer for scaling vector. If None, then the vector is filled by 1. If a scalar, the vector is filled by it. If numpy.ndarray, the vector is set by it.
initial_beta (Initializer) – Initializer for shifting vector. If None, then the vector is filled by 0. If a scalar, the vector is filled by it. If numpy.ndarray, the vector is set by it.

Variables:

gamma (Variable) – Scaling parameter.
beta (Variable) – Shifting parameter.
eps (float) – Epsilon value for numerical stability.

See: Layer Normalization

__call__(x)[source]¶

Apply layer normalization to given input.

Parameters:	x (Variable) – Batch vectors. Shape of this value must be (batch_size, unit_size), e.g., the output of `linear()`.
Returns:	Output of the layer normalization.
Return type:	Variable

BinaryHierarchicalSoftmax¶

class chainer.links.BinaryHierarchicalSoftmax(in_size, tree)[source]¶

Hierarchical softmax layer over binary tree.

In natural language applications, vocabulary size is too large to use softmax loss. Instead, the hierarchical softmax uses product of sigmoid functions. It costs only $O(\log(n))$ time where $n$ is the vocabulary size in average.

At first a user need to prepare a binary tree whose each leaf is corresponding to a word in a vocabulary. When a word $x$ is given, exactly one path from the root of the tree to the leaf of the word exists. Let $\mbox{path}(x) = ((e_1, b_1), \dots, (e_m, b_m))$ be the path of $x$, where $e_i$ is an index of $i$-th internal node, and $b_i \in \{-1, 1\}$ indicates direction to move at $i$-th internal node (-1 is left, and 1 is right). Then, the probability of $x$ is given as below:

\[\begin{split}P(x) &= \prod_{(e_i, b_i) \in \mbox{path}(x)}P(b_i | e_i) \\ &= \prod_{(e_i, b_i) \in \mbox{path}(x)}\sigma(b_i x^\top w_{e_i}),\end{split}\]

where $\sigma(\cdot)$ is a sigmoid function, and $w$ is a weight matrix.

This function costs $O(\log(n))$ time as an average length of paths is $O(\log(n))$, and $O(n)$ memory as the number of internal nodes equals $n - 1$.

Parameters:	in_size (int) – Dimension of input vectors. tree – A binary tree made with tuples like ((1, 2), 3).
Variables:	W (Variable) – Weight parameter matrix.

See: Hierarchical Probabilistic Neural Network Language Model [Morin+, AISTAT2005].

__call__(x, t)[source]¶

Computes the loss value for given input and ground truth labels.

Parameters:	x (Variable) – Input to the classifier at each node. t (Variable) – Batch of ground truth labels.
Returns:	Loss value.
Return type:	Variable

static create_huffman_tree(word_counts)[source]¶

Makes a Huffman tree from a dictionary containing word counts.

This method creates a binary Huffman tree, that is required for BinaryHierarchicalSoftmax. For example, {0: 8, 1: 5, 2: 6, 3: 4} is converted to ((3, 1), (2, 0)).

Parameters:	word_counts (dict of int key and int or float values) – Dictionary representing counts of words.
Returns:	Binary Huffman tree with tuples and keys of `word_coutns`.

BlackOut¶

class chainer.links.BlackOut(in_size, counts, sample_size)[source]¶

BlackOut loss layer.

See also

black_out() for more detail.

Parameters:	in_size (int) – Dimension of input vectors. counts (int list) – Number of each identifiers. sample_size (int) – Number of negative samples.
Variables:	W (Variable) – Weight parameter matrix.

CRF1d¶

class chainer.links.CRF1d(n_label)[source]¶

Linear-chain conditional random field loss layer.

This link wraps the crf1d() function. It holds a transition cost matrix as a parameter.

Parameters:	n_label (int) – Number of labels.

See also

crf1d() for more detail.

Variables:	cost (Variable) – Transition cost parameter.

argmax(xs)[source]¶

Computes a state that maximizes a joint probability.

Parameters:	xs (list of Variable) – Input vector for each label.
Returns:	A tuple of `Variable` representing each log-likelihood and a list representing the argmax path.
Return type:	tuple

See also

See crf1d_argmax() for more detail.

SimplifiedDropconnect¶

class chainer.links.SimplifiedDropconnect(in_size, out_size, ratio=0.5, nobias=False, initialW=None, initial_bias=None)[source]¶

Fully-connected layer with simplified dropconnect regularization.

Notice: This implementation cannot be used for reproduction of the paper. There is a difference between the current implementation and the original one. The original version uses sampling with gaussian distribution before passing activation function, whereas the current implementation averages before activation.

Parameters:

in_size (int) – Dimension of input vectors. If None, parameter initialization will be deferred until the first forward data pass at which time the size will be determined.
out_size (int) – Dimension of output vectors.
nobias (bool) – If True, then this link does not use the bias term.
initialW (3-D array or None) – Initial weight value. If None, the default initializer is used. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.
initial_bias (2-D array, float or None) – Initial bias value. If None, the bias is set to 0. May also be a callable that takes numpy.ndarray or cupy.ndarray and edits its value.

Variables:

W (Variable) – Weight parameter.
b (Variable) – Bias parameter.

See also

simplified_dropconnect()

See also

Li, W., Matthew Z., Sixin Z., Yann L., Rob F. (2013). Regularization of Neural Network using DropConnect. International Conference on Machine Learning. URL

__call__(x, train=True, mask=None)[source]¶

Applies the simplified dropconnect layer.

Parameters:	x (chainer.Variable or `numpy.ndarray` or cupy.ndarray) – Batch of input vectors. Its first dimension `n` is assumed to be the minibatch dimension. train (bool) – If `True`, executes simplified dropconnect. Otherwise, simplified dropconnect link works as a linear unit.

:param mask (None or chainer.Variable or numpy.ndarray or: cupy.ndarray):: If None, randomized simplified dropconnect mask is generated. Otherwise, The mask must be (n, M, N) shaped array. Main purpose of this option is debugging. mask array will be used as a dropconnect mask.

Returns:	Output of the simplified dropconnect layer.
Return type:	Variable

PReLU¶

class chainer.links.PReLU(shape=(), init=0.25)[source]¶

Parametric ReLU function as a link.

Parameters:	shape (tuple of ints) – Shape of the parameter array. init (float) – Initial parameter value.

See the paper for details: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

See also

chainer.functions.prelu()

Variables:	W (Variable) – Coefficient of parametric ReLU.

__call__(x)[source]¶

Applies the parametric ReLU activation function.

Parameters:	x (Variable) – Input variable.
Returns:	Output of the parametric ReLU function.
Return type:	Variable

Maxout¶

class chainer.links.Maxout(in_size, out_size, pool_size, wscale=1, initialW=None, initial_bias=0)[source]¶

Fully-connected maxout layer.

Let M, P and N be an input dimension, a pool size, and an output dimension, respectively. For an input vector $x$ of size M, it computes

\[Y_{i} = \mathrm{max}_{j} (W_{ij\cdot}x + b_{ij}).\]

Here $W$ is a weight tensor of shape (M, P, N), $b$ an optional bias vector of shape (M, P) and $W_{ij\cdot}$ is a sub-vector extracted from $W$ by fixing first and second dimensions to $i$ and $j$, respectively. Minibatch dimension is omitted in the above equation.

As for the actual implementation, this chain has a Linear link with a (M * P, N) weight matrix and an optional M * P dimensional bias vector.

Parameters:

in_size (int) – Dimension of input vectors.
out_size (int) – Dimension of output vectors.
pool_size (int) – Number of channels.
wscale (float) – Scaling factor of the weight matrix.
initialW (3-D array or None) – Initial weight value. If None, then this function uses Gaussian distribution scaled by w_scale to initialize weight.
initial_bias (2-D array, float or None) – Initial bias value. If it is float, initial bias is filled with this value. If None, bias is omitted.

Variables:

linear (Link) – The Linear link that performs affine transformation.

See also

maxout()

See also

Goodfellow, I., Warde-farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout Networks. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 1319-1327). URL

__call__(x)[source]¶

Applies the maxout layer.

Parameters:	x (Variable) – Batch of input vectors.
Returns:	Output of the maxout layer.
Return type:	Variable

NegativeSampling¶

class chainer.links.NegativeSampling(in_size, counts, sample_size, power=0.75)[source]¶

Negative sampling loss layer.

This link wraps the negative_sampling() function. It holds the weight matrix as a parameter. It also builds a sampler internally given a list of word counts.

Parameters:	in_size (int) – Dimension of input vectors. counts (int list) – Number of each identifiers. sample_size (int) – Number of negative samples. power (float) – Power factor $\alpha$.

See also

negative_sampling() for more detail.

Variables:	W (Variable) – Weight parameter matrix.

__call__(x, t, reduce='sum')[source]¶

Computes the loss value for given input and ground truth labels.

Parameters:	x (Variable) – Input of the weight matrix multiplication. t (Variable) – Batch of ground truth labels. reduce (str) – Reduction option. Its value must be either `'sum'` or `'no'`. Otherwise, `ValueError` is raised.
Returns:	Loss value.
Return type:	Variable

Machine learning models¶

Classifier¶

class chainer.links.Classifier(predictor, lossfun=<function softmax_cross_entropy>, accfun=<function accuracy>)[source]¶

A simple classifier model.

This is an example of chain that wraps another chain. It computes the loss and accuracy based on a given input/label pair.

Parameters:

predictor (Link) – Predictor network.
lossfun (function) – Loss function.
accfun (function) – Function that computes accuracy.

Variables:

predictor (Link) – Predictor network.
lossfun (function) – Loss function.
accfun (function) – Function that computes accuracy.
y (Variable) – Prediction for the last minibatch.
loss (Variable) – Loss value for the last minibatch.
accuracy (Variable) – Accuracy for the last minibatch.
compute_accuracy (bool) – If True, compute accuracy on the forward computation. The default value is True.

__call__(*args)[source]¶

Computes the loss value for an input and label pair.

It also computes accuracy and stores it to the attribute.

Parameters:	args (list of ~chainer.Variable) – Input minibatch.

The all elements of args but last one are features and the last element corresponds to ground truth labels. It feeds features to the predictor and compare the result with ground truth labels.

Returns:	Loss value.
Return type:	Variable

Pre-trained models¶

Pre-trained models are mainly used to achieve a good performance with a small dataset, or extract a semantic feature vector. Although CaffeFunction automatically loads a pre-trained model released as a caffemodel, the following link models provide an interface for automatically converting caffemodels, and easily extracting semantic feature vectors.

For example, to extract the feature vectors with VGG16Layers, which is a common pre-trained model in the field of image recognition, users need to write the following few lines:

from chainer.links import VGG16Layers
from PIL import Image

model = VGG16Layers()
img = Image.open("path/to/image.jpg")
feature = model.extract([img], layers=["fc7"])["fc7"]

where fc7 denotes a layer before the last fully-connected layer. Unlike the usual links, these classes automatically load all the parameters from the pre-trained models during initialization.

VGG16Layers¶

class chainer.links.VGG16Layers(pretrained_model='auto')[source]¶

A pre-trained CNN model with 16 layers provided by VGG team [1].

During initialization, this chain model automatically downloads the pre-trained caffemodel, convert to another chainer model, stores it on your local directory, and initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector from a given image, or fine-tune the model on a different dataset. Note that this pre-trained model is released under Creative Commons Attribution License.

If you want to manually convert the pre-trained caffemodel to a chainer model that can be specified in the constructor, please use convert_caffemodel_to_npz classmethod instead.

[1]	K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition

Parameters: pretrained_model (str) – the destination of the pre-trained chainer model serialized as a .npz file. If this argument is specified as auto, it automatically downloads the caffemodel from the internet. Note that in this case the converted chainer model is stored on $CHAINER_DATASET_ROOT/pfnet/chainer/models directory, where $CHAINER_DATASET_ROOT is set as $HOME/.chainer/dataset unless you specify another value as a environment variable. The converted chainer model is automatically used from the second time. If the argument is specified as None, all the parameters are not initialized by the pre-trained model, but the default initializer used in the original paper, i.e., chainer.initializers.Normal(scale=0.01).

Variables: available_layers (list of str) – The list of available layer names used by __call__ and extract methods.

__call__(x, layers=['prob'], test=True)[source]¶

Computes all the feature maps specified by layers.

Parameters:	x (Variable) – Input variable. layers (list of str) – The list of layer names you want to extract. test (bool) – If `True`, dropout runs in test mode.
Returns:	A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type:	Dictionary of ~chainer.Variable

classmethod convert_caffemodel_to_npz(path_caffemodel, path_npz)[source]¶

Converts a pre-trained caffemodel to a chainer model.

Parameters:	path_caffemodel (str) – Path of the pre-trained caffemodel. path_npz (str) – Path of the converted chainer model.

extract(images, layers=['fc7'], size=(224, 224), test=True, volatile=OFF)[source]¶

Extracts all the feature maps of given images.

The difference of directly executing __call__ is that it directly accepts images as an input and automatically transforms them to a proper variable. That is, it is also interpreted as a shortcut method that implicitly calls prepare and __call__ functions.

Parameters:	images (iterable of PIL.Image or numpy.ndarray) – Input images. layers (list of str) – The list of layer names you want to extract. size (pair of ints) – The resolution of resized images used as an input of CNN. All the given images are not resized if this argument is `None`, but the resolutions of all the images should be the same. test (bool) – If `True`, dropout runs in test mode. volatile (Flag) – Volatility flag used for input variables.
Returns:	A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type:	Dictionary of ~chainer.Variable

predict(images, oversample=True)[source]¶

Computes all the probabilities of given images.

Parameters:	images (iterable of PIL.Image or numpy.ndarray) – Input images. oversample (bool) – If `True`, it averages results across center, corners, and mirrors. Otherwise, it uses only the center.
Returns:	Output that contains the class probabilities of given images.
Return type:	Variable

chainer.links.model.vision.vgg.prepare(image, size=(224, 224))[source]¶

Converts the given image to the numpy array for VGG models.

Note that you have to call this method before __call__ because the pre-trained vgg model requires to resize the given image, covert the RGB to the BGR, subtract the mean, and permute the dimensions before calling.

Parameters:	image (PIL.Image or numpy.ndarray) – Input image. If an input is `numpy.ndarray`, its shape must be `(height, width)`, `(height, width, channels)`, or `(channels, height, width)`, and the order of the channels must be RGB. size (pair of ints) – Size of converted images. If `None`, the given image is not resized.
Returns:	The converted output array.
Return type:	numpy.ndarray

GoogLeNet¶

class chainer.links.GoogLeNet(pretrained_model='auto')[source]¶

A pre-trained GoogLeNet model provided by BVLC [1].

When you specify the path of the pre-trained chainer model serialized as a .npz file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or fine-tune the model on a different dataset.

If you want to manually convert the pre-trained caffemodel to a chainer model that can be specified in the constructor, please use convert_caffemodel_to_npz classmethod instead.

GoogLeNet, which is also called Inception-v1, is an architecture of convolutional neural network proposed in 2014. This model is relatively lightweight and requires small memory footprint during training compared with modern architectures such as ResNet. Therefore, if you fine-tune your network based on a model pre-trained by Imagenet and need to train it with large batch size, GoogLeNet may be useful. On the other hand, if you just want an off-the-shelf classifier, we recommend you to use ResNet50 or other models since they are more accurate than GoogLeNet.

[1]	https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet

Parameters: pretrained_model (str) – the destination of the pre-trained chainer model serialized as a .npz file. If this argument is specified as auto, it automatically downloads the caffemodel from the internet. Note that in this case the converted chainer model is stored on $CHAINER_DATASET_ROOT/pfnet/chainer/models directory, where $CHAINER_DATASET_ROOT is set as $HOME/.chainer/dataset unless you specify another value as a environment variable. The converted chainer model is automatically used from the second time. If the argument is specified as None, all the parameters are not initialized by the pre-trained model, but the default initializer used in BVLC, i.e., chainer.initializers.LeCunUniform(scale=1.0). Note that, in Caffe, when weight_filler is specified as “xavier” type without variance_norm parameter, the weights are initialized by Uniform(-s, s), where $s = \sqrt{\frac{3}{fan_{in}}}$ and $fan_{in}$ is the number of input units. This corresponds to LeCunUniform in Chainer but not GlorotUniform.

Variables: available_layers (list of str) – The list of available layer names used by __call__ and extract methods.

__call__(x, layers=['prob'], train=False)[source]¶

Computes all the feature maps specified by layers.

Parameters:	x (Variable) – Input variable. It should be prepared by function. (prepare) – layers (list of str) – The list of layer names you want to extract. train (bool) – If `True`, Dropout runs in training mode.
Returns:	A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type:	Dictionary of ~chainer.Variable

classmethod convert_caffemodel_to_npz(path_caffemodel, path_npz)[source]¶

Converts a pre-trained caffemodel to a chainer model.

Parameters:	path_caffemodel (str) – Path of the pre-trained caffemodel. path_npz (str) – Path of the converted chainer model.

extract(images, layers=['pool5'], size=(224, 224), train=False, volatile=OFF)[source]¶

Extracts all the feature maps of given images.

The difference of directly executing __call__ is that it directly accepts images as an input and automatically transforms them to a proper variable. That is, it is also interpreted as a shortcut method that implicitly calls prepare and __call__ functions.

Parameters:	images (iterable of PIL.Image or numpy.ndarray) – Input images. layers (list of str) – The list of layer names you want to extract. size (pair of ints) – The resolution of resized images used as an input of CNN. All the given images are not resized if this argument is `None`, but the resolutions of all the images should be the same. train (bool) – If `True`, Dropout runs in training mode. volatile (Flag) – Volatility flag used for input variables.
Returns:	A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type:	Dictionary of ~chainer.Variable

predict(images, oversample=True)[source]¶

Computes all the probabilities of given images.

Parameters:	images (iterable of PIL.Image or numpy.ndarray) – Input images. oversample (bool) – If `True`, it averages results across center, corners, and mirrors. Otherwise, it uses only the center.
Returns:	Output that contains the class probabilities of given images.
Return type:	Variable

chainer.links.model.vision.googlenet.prepare(image, size=(224, 224))[source]¶

Converts the given image to the numpy array for GoogLeNet.

Note that you have to call this method before __call__ because the pre-trained GoogLeNet model requires to resize the given image, covert the RGB to the BGR, subtract the mean, and permute the dimensions before calling.

Parameters:	image (PIL.Image or numpy.ndarray) – Input image. If an input is `numpy.ndarray`, its shape must be `(height, width)`, `(height, width, channels)`, or `(channels, height, width)`, and the order of the channels must be RGB. size (pair of ints) – Size of converted images. If `None`, the given image is not resized.
Returns:	The converted output array.
Return type:	numpy.ndarray

Residual Networks¶

class chainer.links.model.vision.resnet.ResNetLayers(pretrained_model, n_layers)[source]¶

A pre-trained CNN model provided by MSRA [1].

When you specify the path of the pre-trained chainer model serialized as a .npz file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or fine-tune the model on a different dataset. Note that unlike VGG16Layers, it does not automatically download a pre-trained caffemodel. This caffemodel can be downloaded at GitHub.

If you want to manually convert the pre-trained caffemodel to a chainer model that can be specified in the constructor, please use convert_caffemodel_to_npz classmethod instead.

[1]	K. He et. al., Deep Residual Learning for Image Recognition

Parameters:

pretrained_model (str) – the destination of the pre-trained chainer model serialized as a .npz file. If this argument is specified as auto, it automatically loads and converts the caffemodel from $CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet-{n-layers}-model.caffemodel, where $CHAINER_DATASET_ROOT is set as $HOME/.chainer/dataset unless you specify another value by modifying the environment variable and {n_layers} is replaced with the specified number of layers given as the first argment to this costructor. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified as None, all the parameters are not initialized by the pre-trained model, but the default initializer used in the original paper, i.e., chainer.initializers.HeNormal(scale=1.0).
n_layers (int) – The number of layers of this model. It should be either 50, 101, or 152.

Variables:

available_layers (list of str) – The list of available layer names used by __call__ and extract methods.

__call__(x, layers=['prob'], test=True)[source]¶

Computes all the feature maps specified by layers.

Parameters:	x (Variable) – Input variable. layers (list of str) – The list of layer names you want to extract. test (bool) – If `True`, BarchNormalization runs in test mode.
Returns:	A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type:	Dictionary of ~chainer.Variable

classmethod convert_caffemodel_to_npz(path_caffemodel, path_npz, n_layers=50)[source]¶

Converts a pre-trained caffemodel to a chainer model.

Parameters:	path_caffemodel (str) – Path of the pre-trained caffemodel. path_npz (str) – Path of the converted chainer model.

extract(images, layers=['pool5'], size=(224, 224), test=True, volatile=OFF)[source]¶

Extracts all the feature maps of given images.

The difference of directly executing __call__ is that it directly accepts images as an input and automatically transforms them to a proper variable. That is, it is also interpreted as a shortcut method that implicitly calls prepare and __call__ functions.

Parameters:	images (iterable of PIL.Image or numpy.ndarray) – Input images. layers (list of str) – The list of layer names you want to extract. size (pair of ints) – The resolution of resized images used as an input of CNN. All the given images are not resized if this argument is `None`, but the resolutions of all the images should be the same. test (bool) – If `True`, BatchNormalization runs in test mode. volatile (Flag) – Volatility flag used for input variables.
Returns:	A directory in which the key contains the layer name and the value contains the corresponding feature map variable.
Return type:	Dictionary of ~chainer.Variable

predict(images, oversample=True)[source]¶

Computes all the probabilities of given images.

Parameters:	images (iterable of PIL.Image or numpy.ndarray) – Input images. oversample (bool) – If `True`, it averages results across center, corners, and mirrors. Otherwise, it uses only the center.
Returns:	Output that contains the class probabilities of given images.
Return type:	Variable

class chainer.links.ResNet50Layers(pretrained_model='auto')[source]¶

A pre-trained CNN model with 50 layers provided by MSRA [1].

When you specify the path of the pre-trained chainer model serialized as a .npz file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or fine-tune the model on a different dataset. Note that unlike VGG16Layers, it does not automatically download a pre-trained caffemodel. This caffemodel can be downloaded at GitHub.

If you want to manually convert the pre-trained caffemodel to a chainer model that can be specified in the constructor, please use convert_caffemodel_to_npz classmethod instead.

ResNet50 has 25,557,096 trainable parameters, and it’s 58% and 43% fewer than ResNet101 and ResNet152, respectively. On the other hand, the top-5 classification accuracy on ImageNet dataset drops only 0.7% and 1.1% from ResNet101 and ResNet152, respectively. Therefore, ResNet50 may have the best balance between the accuracy and the model size. It would be basically just enough for many cases, but some advanced models for object detection or semantic segmentation use deeper ones as their building blocks, so these deeper ResNets are here for making reproduction work easier.

[1]	K. He et. al., Deep Residual Learning for Image Recognition

Parameters: pretrained_model (str) – the destination of the pre-trained chainer model serialized as a .npz file. If this argument is specified as auto, it automatically loads and converts the caffemodel from $CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet-50-model.caffemodel, where $CHAINER_DATASET_ROOT is set as $HOME/.chainer/dataset unless you specify another value by modifying the environment variable. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified as None, all the parameters are not initialized by the pre-trained model, but the default initializer used in the original paper, i.e., chainer.initializers.HeNormal(scale=1.0).

Variables: available_layers (list of str) – The list of available layer names used by __call__ and extract methods.

class chainer.links.ResNet101Layers(pretrained_model='auto')[source]¶

A pre-trained CNN model with 101 layers provided by MSRA [1].

When you specify the path of the pre-trained chainer model serialized as a .npz file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or fine-tune the model on a different dataset. Note that unlike VGG16Layers, it does not automatically download a pre-trained caffemodel. This caffemodel can be downloaded at GitHub.

If you want to manually convert the pre-trained caffemodel to a chainer model that can be specified in the constructor, please use convert_caffemodel_to_npz classmethod instead.

ResNet101 has 44,549,224 trainable parameters, and it’s 43% fewer than ResNet152 model, while the top-5 classification accuracy on ImageNet dataset drops 1.1% from ResNet152. For many cases, ResNet50 may have the best balance between the accuracy and the model size.

[1]	K. He et. al., Deep Residual Learning for Image Recognition

Parameters: pretrained_model (str) – the destination of the pre-trained chainer model serialized as a .npz file. If this argument is specified as auto, it automatically loads and converts the caffemodel from $CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet-101-model.caffemodel, where $CHAINER_DATASET_ROOT is set as $HOME/.chainer/dataset unless you specify another value by modifying the environment variable. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified as None, all the parameters are not initialized by the pre-trained model, but the default initializer used in the original paper, i.e., chainer.initializers.HeNormal(scale=1.0).

Variables: available_layers (list of str) – The list of available layer names used by __call__ and extract methods.

class chainer.links.ResNet152Layers(pretrained_model='auto')[source]¶

A pre-trained CNN model with 152 layers provided by MSRA [1].

When you specify the path of the pre-trained chainer model serialized as a .npz file in the constructor, this chain model automatically initializes all the parameters with it. This model would be useful when you want to extract a semantic feature vector per image, or fine-tune the model on a different dataset. Note that unlike VGG16Layers, it does not automatically download a pre-trained caffemodel. This caffemodel can be downloaded at GitHub.

If you want to manually convert the pre-trained caffemodel to a chainer model that can be specified in the constructor, please use convert_caffemodel_to_npz classmethod instead.

ResNet152 has 60,192,872 trainable parameters, and it’s the deepest ResNet model and it achieves the best result on ImageNet classification task in ILSVRC 2015.

[1]	K. He et. al., Deep Residual Learning for Image Recognition

Parameters: pretrained_model (str) – the destination of the pre-trained chainer model serialized as a .npz file. If this argument is specified as auto, it automatically loads and converts the caffemodel from $CHAINER_DATASET_ROOT/pfnet/chainer/models/ResNet-152-model.caffemodel, where $CHAINER_DATASET_ROOT is set as $HOME/.chainer/dataset unless you specify another value by modifying the environment variable. Note that in this case the converted chainer model is stored on the same directory and automatically used from the next time. If this argument is specified as None, all the parameters are not initialized by the pre-trained model, but the default initializer used in the original paper, i.e., chainer.initializers.HeNormal(scale=1.0).

Variables: available_layers (list of str) – The list of available layer names used by __call__ and extract methods.

chainer.links.model.vision.resnet.prepare(image, size=(224, 224))[source]¶

Converts the given image to the numpy array for ResNets.

Note that you have to call this method before __call__ because the pre-trained resnet model requires to resize the given image, covert the RGB to the BGR, subtract the mean, and permute the dimensions before calling.

Parameters:	image (PIL.Image or numpy.ndarray) – Input image. If an input is `numpy.ndarray`, its shape must be `(height, width)`, `(height, width, channels)`, or `(channels, height, width)`, and the order of the channels must be RGB. size (pair of ints) – Size of converted images. If `None`, the given image is not resized.
Returns:	The converted output array.
Return type:	numpy.ndarray

Deprecated links¶

Parameter¶

class chainer.links.Parameter(array)[source]¶

Link that just holds a parameter and returns it.

Deprecated since version v1.5: The parameters are stored as variables as of v1.5. Use them directly instead.

Parameters:	array – Initial parameter array.
Variables:	W (Variable) – Parameter variable.

__call__(volatile='off')[source]¶

Returns the parameter variable.

Parameters:	volatile (Flag) – The volatility of the returned variable.
Returns:	A copy of the parameter variable with given volatility.
Return type:	Variable