Variables and Derivatives¶
In the example code of this tutorial, we assume for simplicity that the following symbols are already imported.
import numpy as np import chainer from chainer.backends import cuda from chainer import Function, gradient_check, report, training, utils, Variable from chainer import datasets, iterators, optimizers, serializers from chainer import Link, Chain, ChainList import chainer.functions as F import chainer.links as L from chainer.training import extensions
As described previously, Chainer uses the “Define-by-Run” scheme, so forward computation itself defines the network.
In order to start forward computation, we have to set the input array to a
Here we start with a simple
ndarray with only one element:
>>> x_data = np.array(, dtype=np.float32) >>> x = Variable(x_data)
A Variable object supports basic arithmetic operators. In order to compute \(y = x^2 - 2x + 1\), just write:
>>> y = x**2 - 2 * x + 1
y is also a Variable object, whose value can be extracted by accessing the
>>> y.array array([16.], dtype=float32)
Variable has two attributes to represent the underlying array:
There is no difference between the two; both refer to exactly the same object.
However it is not recommended to use
.data because it might be confused with
y holds is not only the result value.
It also holds the history of computation (or computational graph), which enables us to compute its derivative.
This is done by calling its
This runs error backpropagation (a.k.a. backprop or reverse-mode automatic differentiation).
Then, the gradient is computed and stored in the
grad attribute of the input variable
>>> x.grad array([8.], dtype=float32)
Also we can compute gradients of intermediate variables.
Note that Chainer, by default, releases the gradient arrays of intermediate variables for memory efficiency.
In order to preserve gradient information, pass the
retain_grad argument to the backward method:
>>> z = 2*x >>> y = x**2 - z + 1 >>> y.backward(retain_grad=True) >>> z.grad array([-1.], dtype=float32)
All these computations are can be generalized to a multi-element array input.
While single-element arrays are automatically initialized to
, to start backward computation from a variable holding a multi-element array, we must set the initial error manually.
This is done simply by setting the
grad attribute of the output variable:
>>> x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)) >>> y = x**2 - 2*x + 1 >>> y.grad = np.ones((2, 3), dtype=np.float32) >>> y.backward() >>> x.grad array([[ 0., 2., 4.], [ 6., 8., 10.]], dtype=float32)
Variable also supports higher-order derivatives (a.k.a. double backpropagation).
Let’s see a simple example.
First calculate the first-order derivative.
enable_double_backprop=True is passed to
>>> x = chainer.Variable(np.array([[0, 2, 3], [4, 5, 6]], dtype=np.float32)) >>> y = x ** 3 >>> y.grad = np.ones((2, 3), dtype=np.float32) >>> y.backward(enable_double_backprop=True) >>> x.grad_var variable([[ 0., 12., 27.], [ 48., 75., 108.]]) >>> assert x.grad_var.array is x.grad >>> assert (x.grad == (3 * x**2).array).all()
chainer.Variable.grad_var is a
chainer.Variable.grad (which is an
backward(), a computational graph for the backward calculation is recorded.
So, you can start backpropagation from
x.grad_var to calculate the second-order derivative.
>>> gx = x.grad_var >>> x.cleargrad() >>> gx.grad = np.ones((2, 3), dtype=np.float32) >>> gx.backward() >>> x.grad array([[ 0., 12., 18.], [24., 30., 36.]], dtype=float32) >>> assert (x.grad == (6 * x).array).all()