chainer.functions.spatial_pyramid_pooling_2d(x, pyramid_height, pooling=None)[source]

Spatial pyramid pooling function.

It outputs a fixed-length vector regardless of input feature map size.

It performs pooling operation to the input 4D-array x with different kernel sizes and padding sizes, and then flattens all dimensions except first dimension of all pooling results, and finally concatenates them along second dimension.

At \(i\)-th pyramid level, the kernel size \((k_h^{(i)}, k_w^{(i)})\) and padding size \((p_h^{(i)}, p_w^{(i)})\) of pooling operation are calculated as below:

\[\begin{split}k_h^{(i)} &= \lceil b_h / 2^i \rceil, \\ k_w^{(i)} &= \lceil b_w / 2^i \rceil, \\ p_h^{(i)} &= (2^i k_h^{(i)} - b_h) / 2, \\ p_w^{(i)} &= (2^i k_w^{(i)} - b_w) / 2,\end{split}\]

where \(\lceil \cdot \rceil\) denotes the ceiling function, and \(b_h, b_w\) are height and width of input variable x, respectively. Note that index of pyramid level \(i\) is zero-based.

See detail in paper: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

  • x (Variable) – Input variable. The shape of x should be (batchsize, # of channels, height, width).

  • pyramid_height (int) – Number of pyramid levels

  • pooling (str) – Currently, only max is supported, which performs a 2d max pooling operation.


Output variable. The shape of the output variable will be \((batchsize, c \sum_{h=0}^{H-1} 2^{2h}, 1, 1)\), where \(c\) is the number of channels of input variable x and \(H\) is the number of pyramid levels.

Return type