chainer.functions.dilated_convolution_2d(x, W, b=None, stride=1, pad=0, dilate=1, cover_all=False)[source]

Two-dimensional dilated convolution function.

This is an implementation of two-dimensional dilated convolution in ConvNets. It takes three variables: the input image x, the filter weight W, and the bias vector b.

Notation: here is a notation for dimensionalities.

  • \(n\) is the batch size.
  • \(c_I\) and \(c_O\) are the number of the input and output, respectively.
  • \(h\) and \(w\) are the height and width of the input image, respectively.
  • \(k_H\) and \(k_W\) are the height and width of the filters, respectively.
  • x (Variable) – Input variable of shape \((n, c_I, h, w)\).
  • W (Variable) – Weight variable of shape \((c_O, c_I, k_H, k_W)\).
  • b (Variable) – Bias variable of length \(c_O\) (optional).
  • stride (int or pair of ints) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.
  • pad (int or pair of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.
  • dilate (int or pair of ints) – Dilation factor of filter applications. dilate=d and dilate=(d, d) are equivalent.
  • cover_all (bool) – If True, all spatial locations are convoluted into some output pixels. It may make the output size larger.

Output variable.

Return type:


The two-dimensional dilated convolution function is defined as follows. Then the DilatedConvolution2D function computes correlations between filters and patches of size \((k_H, k_W)\) in x. Patches here are extracted at intervals of the dilation factor. Note that correlation here is equivalent to the inner product between expanded vectors. Patches are extracted at intervals of the dilation factor and at positions shifted by multiples of stride from the first position -pad for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.

Let \((s_Y, s_X)\) be the stride of filter application, \((p_H, p_W)\) the spatial padding size, and \((d_Y, d_X)\) the dilation factor of filter application. Then, the output size \((h_O, w_O)\) is determined by the following equations:

\[\begin{split}h_O &= (h + 2p_H - k_H - (k_H - 1) * (d_Y - 1)) / s_Y + 1,\\ w_O &= (w + 2p_W - k_W - (k_W - 1) * (d_X - 1)) / s_X + 1.\end{split}\]

If the bias vector is given, then it is added to all spatial locations of the output of convolution.

See also