# Variables and Derivatives¶

In the example code of this tutorial, we assume for simplicity that the following symbols are already imported.

```
import math
import numpy as np
import chainer
from chainer import backend
from chainer import backends
from chainer.backends import cuda
from chainer import Function, FunctionNode, gradient_check, report, training, utils, Variable
from chainer import datasets, initializers, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
```

As described previously, Chainer uses the “Define-by-Run” scheme, so forward computation itself *defines* the network.
In order to start forward computation, we have to set the input array to a `chainer.Variable`

object.
Here we start with a simple `ndarray`

with only one element:

```
>>> x_data = np.array([5], dtype=np.float32)
>>> x = Variable(x_data)
```

A Variable object supports basic arithmetic operators. In order to compute \(y = x^2 - 2x + 1\), just write:

```
>>> y = x**2 - 2 * x + 1
```

The resulting `y`

is also a Variable object, whose value can be extracted by accessing the `array`

attribute:

```
>>> y.array
array([16.], dtype=float32)
```

Note

`Variable`

has two attributes to represent the underlying array: `array`

and `data`

.
There is no difference between the two; both refer to exactly the same object.
However it is not recommended that you use `.data`

because it might be confused with `numpy.ndarray.data`

attribute.

What `y`

holds is not only the result value.
It also holds the history of computation (or computational graph), which enables us to compute its derivative.
This is done by calling its `backward()`

method:

```
>>> y.backward()
```

This runs *error backpropagation* (a.k.a. *backprop* or *reverse-mode automatic differentiation*).
Then, the gradient is computed and stored in the `grad`

attribute of the input variable `x`

:

```
>>> x.grad
array([8.], dtype=float32)
```

Also we can compute gradients of intermediate variables.
Note that Chainer, by default, releases the gradient arrays of intermediate variables for memory efficiency.
In order to preserve gradient information, pass the `retain_grad`

argument to the backward method:

```
>>> z = 2*x
>>> y = x**2 - z + 1
>>> y.backward(retain_grad=True)
>>> z.grad
array([-1.], dtype=float32)
```

All these computations can be generalized to a multi-element array input.
While single-element arrays are automatically initialized to `[1]`

, to start backward computation from a variable holding a multi-element array, we must set the *initial error* manually.
This is done simply by setting the `grad`

attribute of the output variable:

```
>>> x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
>>> y = x**2 - 2*x + 1
>>> y.grad = np.ones((2, 3), dtype=np.float32)
>>> y.backward()
>>> x.grad
array([[ 0., 2., 4.],
[ 6., 8., 10.]], dtype=float32)
```

Note

Many functions taking `Variable`

object(s) are defined in the `chainer.functions`

module.
You can combine them to realize complicated functions with automatic backward computation.

Note

Instead of using `backward()`

, you can also calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the `chainer.grad()`

function.

## Higher-Order Derivatives¶

`Variable`

also supports higher-order derivatives (a.k.a. double backpropagation).

Let’s see a simple example.
First calculate the first-order derivative.
Note that `enable_double_backprop=True`

is passed to `y.backward()`

.

```
>>> x = chainer.Variable(np.array([[0, 2, 3], [4, 5, 6]], dtype=np.float32))
>>> y = x ** 3
>>> y.grad = np.ones((2, 3), dtype=np.float32)
>>> y.backward(enable_double_backprop=True)
>>> x.grad_var
variable([[ 0., 12., 27.],
[ 48., 75., 108.]])
>>> assert x.grad_var.array is x.grad
>>> assert (x.grad == (3 * x**2).array).all()
```

`chainer.Variable.grad_var`

is a `Variable`

for `chainer.Variable.grad`

(which is an `ndarray`

).
By passing `enable_double_backprop=True`

to `backward()`

, a computational graph for the backward calculation is recorded.
So, you can start backpropagation from `x.grad_var`

to calculate the second-order derivative.

```
>>> gx = x.grad_var
>>> x.cleargrad()
>>> gx.grad = np.ones((2, 3), dtype=np.float32)
>>> gx.backward()
>>> x.grad
array([[ 0., 12., 18.],
[24., 30., 36.]], dtype=float32)
>>> assert (x.grad == (6 * x).array).all()
```