chainer.functions.im2col

chainer.functions.im2col(x, ksize, stride=1, pad=0, cover_all=False, dilate=1)[source]

Extract patches from an image based on the filter.

This function rearranges patches of an image and put them in the channel dimension of the output.

Patches are extracted at positions shifted by multiples of stride from the first position -pad for each spatial axis. The right-most (or bottom-most) patches do not run over the padded spatial size.

Notation: here is a notation.

  • \(n\) is the batch size.
  • \(c\) is the number of the input channels.
  • \(h\) and \(w\) are the height and width of the input image, respectively.
  • \(k_H\) and \(k_W\) are the height and width of the filters, respectively.
  • \(s_Y\) and \(s_X\) are the strides of the filter.
  • \(p_H\) and \(p_W\) are the spatial padding sizes.
  • \(d_Y\) and \(d_X\) are the dilation factors of filter application.

The output size \((h_O, w_O)\) is determined by the following equations when cover_all = False:

\[\begin{split}h_O &= (h + 2p_H - k_H - (k_H - 1) * (d_Y - 1)) / s_Y + 1,\\ w_O &= (w + 2p_W - k_W - (k_W - 1) * (d_X - 1)) / s_X + 1.\end{split}\]

When cover_all = True, the output size is determined by the following equations:

\[\begin{split}h_O &= (h + 2p_H - k_H - (k_H - 1) * (d_Y - 1) + s_Y - 1) / s_Y + 1,\\ w_O &= (w + 2p_W - k_W - (k_W - 1) * (d_X - 1) + s_X - 1) / s_X + 1.\end{split}\]
Parameters:
  • x (Variable) – Input variable of shape \((n, c, h, w)\).
  • ksize (int or pair of ints) – Size of filters (a.k.a. kernels). ksize=k and ksize=(k, k) are equivalent.
  • stride (int or pair of ints) – Stride of filter applications. stride=s and stride=(s, s) are equivalent.
  • pad (int or pair of ints) – Spatial padding width for input arrays. pad=p and pad=(p, p) are equivalent.
  • cover_all (bool) – If True, all spatial locations are rearranged into some output pixels. It may make the output size larger.
  • dilate (int or pair of ints) – Dilation factor of filter applications. dilate=d and dilate=(d, d) are equivalent.
Returns:

Output variable whose shape is \((n, c \cdot k_H \cdot k_W, h_O, w_O)\)

Return type:

Variable