chainer.functions.convolution_2d(x, W, b=None, stride=1, pad=0, cover_all=False, *, dilate=1, groups=1)[source]

Two-dimensional convolution function.

This is an implementation of two-dimensional convolution in ConvNets. It takes three variables: the input image x, the filter weight W, and the bias vector b.

Notation: here is a notation for dimensionalities.

  • \(n\) is the batch size.

  • \(c_I\) and \(c_O\) are the number of the input and output channels, respectively.

  • \(h_I\) and \(w_I\) are the height and width of the input image, respectively.

  • \(h_K\) and \(w_K\) are the height and width of the filters, respectively.

  • \(h_P\) and \(w_P\) are the height and width of the spatial padding size, respectively.

Then the Convolution2D function computes correlations between filters and patches of size \((h_K, w_K)\) in x. Note that correlation here is equivalent to the inner product between expanded vectors. Patches are extracted at positions shifted by multiples of stride from the first position (-h_P, -w_P) for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.

Let \((s_Y, s_X)\) be the stride of filter application. Then, the output size \((h_O, w_O)\) is determined by the following equations:

\[\begin{split}h_O &= (h_I + 2h_P - h_K) / s_Y + 1,\\ w_O &= (w_I + 2w_P - w_K) / s_X + 1.\end{split}\]

If cover_all option is True, the filter will cover the all spatial locations. So, if the last stride of filter does not cover the end of spatial locations, an additional stride will be applied to the end part of spatial locations. In this case, the output size \((h_O, w_O)\) is determined by the following equations:

\[\begin{split}h_O &= (h_I + 2h_P - h_K + s_Y - 1) / s_Y + 1,\\ w_O &= (w_I + 2w_P - w_K + s_X - 1) / s_X + 1.\end{split}\]

If the bias vector is given, then it is added to all spatial locations of the output of convolution.

The output of this function can be non-deterministic when it uses cuDNN. If chainer.configuration.config.cudnn_deterministic is True and cuDNN version is >= v3, it forces cuDNN to use a deterministic algorithm.

Convolution links can use a feature of cuDNN called autotuning, which selects the most efficient CNN algorithm for images of fixed-size, can provide a significant performance boost for fixed neural nets. To enable, set chainer.using_config(‘autotune’, True)

When the dilation factor is greater than one, cuDNN is not used unless the version is 6.0 or higher.

  • x (Variable or N-dimensional array) – Input variable of shape \((n, c_I, h_I, w_I)\).

  • W (Variable or N-dimensional array) – Weight variable of shape \((c_O, c_I, h_K, w_K)\).

  • b (None or Variable or N-dimensional array) – Bias variable of length \(c_O\) (optional).

  • stride (int or pair of int s) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.

  • pad (int or pair of int s) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.

  • cover_all (bool) – If True, all spatial locations are convoluted into some output pixels.

  • dilate (int or pair of int s) – Dilation factor of filter applications. dilate=d and dilate=(d, d) are equivalent.

  • groups (int) – Number of groups of channels. If the number is greater than 1, input tensor \(W\) is divided into some blocks by this value. For each tensor blocks, convolution operation will be executed independently. Input channel size \(c_I\) and output channel size \(c_O\) must be exactly divisible by this value.


Output variable of shape \((n, c_O, h_O, w_O)\).

Return type


See also

Convolution2D to manage the model parameters W and b.


>>> n = 10
>>> c_i, c_o = 3, 1
>>> h_i, w_i = 30, 40
>>> h_k, w_k = 10, 10
>>> h_p, w_p = 5, 5
>>> x = np.random.uniform(0, 1, (n, c_i, h_i, w_i)).astype(np.float32)
>>> x.shape
(10, 3, 30, 40)
>>> W = np.random.uniform(0, 1, (c_o, c_i, h_k, w_k)).astype(np.float32)
>>> W.shape
(1, 3, 10, 10)
>>> b = np.random.uniform(0, 1, (c_o,)).astype(np.float32)
>>> b.shape
>>> s_y, s_x = 5, 7
>>> y = F.convolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p))
>>> y.shape
(10, 1, 7, 6)
>>> h_o = int((h_i + 2 * h_p - h_k) / s_y + 1)
>>> w_o = int((w_i + 2 * w_p - w_k) / s_x + 1)
>>> y.shape == (n, c_o, h_o, w_o)
>>> y = F.convolution_2d(x, W, b, stride=(s_y, s_x), pad=(h_p, w_p), cover_all=True)
>>> y.shape == (n, c_o, h_o, w_o + 1)