chainer.functions.spatial_transformer_grid(theta, output_shape, **kwargs)[source]

2D Spatial Transformer grid.

This function generates coordinates of the points sampled from an image to perform warping described in Spatial Transformer Networks.

Given a coordinate in the warped image \((x_i^t, y_i^t)\), the point sampled from the source image \((x_i^s, y_i^s)\) are calculated by the following equation.


cuDNN supports SpatialTransformerGrid from version 5.0.0.

\[\begin{split}\left(\begin{matrix} x_i^s \\ y_i^s \end{matrix}\right) = \left(\begin{matrix} \theta_{11} & \theta_{12} & \theta_{13} \\ \theta_{21} & \theta_{22} & \theta_{23} \end{matrix}\right) \left(\begin{matrix} x_i^t \\ y_i^t \\ 1 \end{matrix}\right)\end{split}\]

Notation: here is a notation for dimensionalities.

  • \(n\) is the batch size.

  • \(h_O\) and \(w_O\) are the height and the width of the output image.

  • theta (Variable or N-dimensional array) – An array of shape \((n, 2, 3)\). This is a batch of \(2 \times 3\) matrix used for the warping described above.

  • output_shape (tuple) – A tuple of 2 elements: \(h_O, w_O\).


A variable of shape \((n, 2, h_O, w_O)\). In the 2nd dimension, the first element is the coordinate along the x axis, and the second element is the coordinate along the y axis. All the coordinates in the image are scaled to fit range \([-1, 1]\). This means that the coordinate \((-1, -1)\) corresponds to the upper-left corner of the input image.

Return type