chainer.FunctionNode¶

class chainer.FunctionNode[source]¶

Function node of the computational graph.

FunctionNode is a class representing a node in a computational graph. The node corresponds to an application of a differentiable function to input variables.

When a differentiable function is applied to Variable objects, it creates an instance of FunctionNode implementation and calls its apply() method. The apply() method basically does the following three things.

Adding an edge from the function node to the variable node corresponding to each input. The node of each input is extracted by Variable.node.
Computing the output arrays of the function.
Creating a Variable object for each output array and adding an edge from the node of the variable to the function node.

The output variables are then returned.

Example

Let x be an instance of Variable and f be an instance of FunctionNode taking only one argument. Then the following code

>>> import numpy, chainer
>>> x = chainer.Variable(numpy.zeros(10))
>>> f = chainer.functions.math.identity.Identity()
>>> y = f.apply((x,))[0]

computes a new variable y and creates backward references. The backward references are actually set as per the following diagram:

x.node <--- f <--- y.node

If an application of another function g occurs as

>>> g = chainer.functions.math.identity.Identity()
>>> z = g.apply((x,))[0]

then the graph grows with a branch:

         |--- f <--- y.node
x.node <-+
         |--- g <--- z.node

Note that the branching is correctly managed on backward computation, i.e. the gradients from f and g are accumulated to the gradient of x.

Every function-node implementation should provide forward() and backward(). Instead of overriding forward(), one can also implement forward_cpu() and forward_gpu() when the implementations for CPU and GPU arrays are totally different.

Note that the input and output variables are inaccessible from backward() by default. If it needs accesses to these variables, the forward() method (or its CPU/GPU variants) has to call retain_inputs() and retain_outputs() appropriately. The retained input/output variables can be accessed from backward() by calling get_retained_inputs() and get_retained_outputs().

Note

There are two types of differentiable functions in Chainer (since v3). The first type is of a function using a subclass of Function, which is called old-style differentiable function. The second type is of a function using a subclass of FunctionNode, which is called new-style differentiable function. There are several advantages on using the new-style differentiable function.

The new-style differentiable function supports differentiable backpropagation. The backpropagated gradients computed through the new-style differentiable functions themselves support further backpropagations so that the automatic higher-order differentiation is available.
The backpropagation of the new-style differentiable function can be more computationally efficient because the interface allows an implementation to omit the computation of unneeded input gradients.

Note that the new-style differentiable function is the standard way of defining a function node of the computational graph in Chainer; old- style differentiable functions are implemented as wrappers of the new- style differentiable functions.

Variables

~FunctionNode.inputs – A tuple of the input VariableNode objects.
~FunctionNode.outputs – A tuple of weak references to the output VariableNode objects.
~FunctionNode.rank (int) – An ordinal following the topological order of the computational graph.
~FunctionNode.stack – Stack trace retrieved at the forward computation. The stack trace is available only in the debug mode.

New in version 3.0.0.

Methods

__call__(*args, **kwargs)[source]¶: Call self as a function.

add_hook(hook, name=None)[source]¶

Registers a function hook.

Parameters

hook (FunctionHook) – Function hook to be registered.
name (str) – Name of the function hook. The name must be unique among function hooks registered to this function. If None, the default name of the function hook is used.

apply(inputs)[source]¶

Computes output variables and grows the computational graph.

Basic behavior is expressed in the documentation of FunctionNode.

Note

If the data attributes of the input variables exist on a GPU device, that device is made current before calling forward(), so implementers do not need to take care of device selection in most cases.

Parameters: inputs – Tuple of input variables. Each element can be either Variable or N-dimensional array. If the element is an ndarray, it is automatically wrapped with Variable.
Returns: A tuple of output Variable objects.

backward(target_input_indexes, grad_outputs)[source]¶

Computes gradients w.r.t. specified inputs given output gradients.

This method is used to compute one step of the backpropagation corresponding to the forward computation of this function node. Given the gradients w.r.t. output variables, this method computes the gradients w.r.t. specified input variables. Note that this method does not need to compute any input gradients not specified by target_input_indices.

Unlike Function.backward(), gradients are given as Variable objects and this method itself has to return input gradients as Variable objects. It enables the function node to return the input gradients with the full computational history, in which case it supports differentiable backpropagation or higher-order differentiation.

The default implementation returns None s, which means the function is not differentiable.

Parameters

target_input_indexes (tuple of int) – Sorted indices of the input variables w.r.t. which the gradients are required. It is guaranteed that this tuple contains at least one element.
grad_outputs (tuple of Variables) – Gradients w.r.t. the output variables. If the gradient w.r.t. an output variable is not given, the corresponding element is None.

Returns

Tuple of variables that represent the gradients w.r.t. specified input variables. The length of the tuple can be same as either len(target_input_indexes) or the number of inputs. In the latter case, the elements not specified by target_input_indexes will be discarded.