Function

class chainer.Function[source]

Function of variable(s) to variable(s) that leaves footprint to the output variables on application.

All function implementations defined in chainer.functions inherit this class.

The main feature of this class is keeping track of function applications as a backward graph. When a function is applied to Variable objects, the function is copied, and its forward() method is called on data fields of input variables, and at the same time it chains references from output variables to the function and from the function to its inputs.

Note

Strictly speaking, when a function is applied to some variable, a special Function object called splitter is inserted between the variable and the function. The splitter is used to manipulate multiple function applications on the same variable, where gradients from different backward paths are accumulated at the variable.

Note

__call__() copies the function instance before the forward computation and chaining. This enables us to reuse one function object for multiple function applications, where the different calls must use different references to the function object. Note that the copy is shallow, so implementations of Function must take care of any member attributes shared accross forward and backward computations.

Example

Let x an instance of Variable and f an instance of Function taking only one argument. Then a line

>>> y = f(x)

computes a new variable y and creates backward references. Actually, backward references are set as per the following diagram:

x <--- (splitter) <--- x' <--- f' <--- y

where prime “’” indicates a copy of the original object. If another application the function occurs as

>>> z = f(x)

then the splitter acts like a branch as the following new diagram:

                    |--- x'  <--- f'  <--- y
x <--- (splitter) <-+
                    |--- x'' <--- f'' <--- z

Note that the splitter is implicitly inserted and user does not need to take any special care of it; just remember that such branching is correctly managed by chainer.

Every function implementation should provide forward_cpu(), forward_gpu(), backward_cpu() and backward_gpu(). Alternatively, one can provide forward() and backward() instead of separate methods. Backward methods have default implementations that just return None, which indicates that the function is non- differentiable.

Function implementations are classified into two types: parameterized ones and non-parameterized ones. A parameterized function holds parameter arrays and coresponding gradient arrays. Implementation can choose any way to keep these arrays, but it is recommended to keep them as attributes to easily migrate between CPU and GPU. Parameterized function must provide accessors to these arrays called parameters() and gradients().

inputs

A tuple or list of input variables.

outputs

A tuple or list of output variables.

parameter_names

A tuple or list of names of parameter attributes. It is set to an empty tuple by default. This attribute is used by the default implementation of parameters() property to gather the collection of parameter arrays. Implementation of parameterized function should override this field as an attribute or a property, or otherwise it should override parameters() property.

gradient_names

A tuple or list of names of gradient attributes. The detail is same as parameter_names.

__call__(*inputs)[source]

Applies forward propagation on input variables with chaining backward reference.

Basic behavior is also expressed in documentation of Function class. This function first copies itself to avoid conflict over multiple invokations.

Note

If the data attribute of input variables reside on GPU device, then, before it calls forward() method, the appropriate device is selected, so in most cases implementor does not need to take care of device selection.

Parameters:inputs – Tuple of input Variable objects. All input variables must have same volatile flag.
Returns:One Variable object or a tuple of multiple Variable objects.
backward(inputs, grad_outputs)[source]

Applies backprop to output gradient arrays.

It delegates the procedure to backward_cpu() or backward_gpu() by default. Which it selects is determined by the type of input arrays and output gradient arrays. Implementations of Function must implement either cpu/gpu methods or this method, if the function is intended to be backprop-ed.

Parameters:
  • inputs – Tuple of input arrays.
  • grad_outputs – Tuple of output gradient arrays.
Returns:

Tuple of input gradient arrays. Some or all of them can be None, if the function is not differentiable on inputs.

Return type:

tuple

Warning

Implementations of Function must take care that the return value must be a tuple even if it returns only one array.

backward_cpu(inputs, grad_outputs)[source]

Applies backprop to output gradient arrays on CPU.

Parameters:
  • inputs – Tuple of input ndarray object(s).
  • grad_outputs – Tuple of output gradient ndarray object(s).
Returns:

Tuple of input gradient ndarray object(s). Some or all of them can be None, if the function is not differentiable on corresponding inputs.

Return type:

tuple

Warning

Implementations of Function must take care that the return value must be a tuple even if it returns only one array.

backward_gpu(inputs, grad_outputs)[source]

Applies backprop to output gradient arrays on GPU.

Parameters:
  • inputs – Tuple of input GPUArray object(s).
  • grad_outputs – Tuple of output gradient GPUArray object(s).
Returns:

Tuple of input gradient GPUArray object(s). Some or all of them can be None, if the function is not differentiable on corresponding inputs.

Return type:

tuple

Warning

Implementations of Function must take care that the return value must be a tuple even if it returns only one array.

forward(inputs)[source]

Applies forward propagation to input arrays.

It delegates the procedure to forward_cpu() or forward_gpu() by default. Which it selects is determined by the type of input arrays. Implementations of Function must implement either cpu/gpu methods or this method.

Parameters:inputs – Tuple of input array(s).
Returns:Tuple of output array(s).

Warning

Implementations of Function must take care that the return value must be a tuple even if it returns only one array.

forward_cpu(inputs)[source]

Applies forward propagation to input arrays on CPU.

Parameters:inputs – Tuple of ndarray object(s).
Returns:Tuple of ndarray object(s).
Return type:tuple

Warning

Implementations of Function must take care that the return value must be a tuple even if it returns only one array.

forward_gpu(inputs)[source]

Applies forward propagation to input arrays on GPU.

Parameters:inputs – Tuple of GPUArray object(s).
Returns:Tuple of GPUArray object(s).
Return type:tuple

Warning

Implementations of Function must take care that the return value must be a tuple even if it returns only one array.

gradients

A tuple of gradient arrays.

Default implementation collects gradient arrays based on gradient_names attribute.

parameters

A tuple of parameter arrays.

Default implementation collects parameter arrays based on parameter_names attribute.

to_cpu()[source]

Migrates the function to CPU and returns self.

The default implementation moves all fields of type pycuda.gpuarray.GPUArray onto CPU.

Returns:self.
to_gpu(device=None)[source]

Migrates the function to GPU and returns self.

The default implementation moves all fields of type ndarray onto GPU.

Parameters:device (int or pycuda.driver.Device or None) – Device ID of GPU that the function will be migrated on. If this is None, the current device is used.
Returns:self.
unchain()[source]

Purges in/out variables and removes this function from the backward graph.

This method is called from Variable.unchain_backward() method.