Test backward procedure of a given function.

This function automatically checks the backward-process of a given function to ensure that the computed gradients are approximately correct. For example, assuming you’ve defined a FunctionNode class MyFunc, that takes two arguments and returns one value, you can wrap it in a ordinary function and check its gradient computations as follows:

def func(xs):
y, = MyFunc().apply(xs)
return y

x1_data = xp.array(...)
x2_data = xp.array(...)
gy_data = xp.array(...)
check_backward(func, (x1_data, x2_data), gy_data)


This function creates Variable objects with x_data and calls func with the Variables to get its result as Variable. Then, it sets y_grad array to grad attribute of the result and calls backward method to get gradients of the inputs. To check correctness of the gradients, the function calls numerical_grad() to calculate numerically the gradients and compares the types of gradients with chainer.testing.assert_allclose().

To reduce computational time, it uses directional derivative along a random vector. A function $$g: \mathbb{R} \rightarrow \mathbb{R}^n$$ is defined as $$g(\delta) = f(x + \delta r)$$, where $$\delta \in \mathbb{R}$$, $$r \in \mathbb{R}^n$$ is a random vector and $$f$$ is a function which you want to test. Its gradient is

$g'(\delta) = f'(x + \delta r) \cdot r.$

Therefore, $$g'(0) = f'(x) \cdot r$$. So we can check the correctness of back propagation of $$f$$ indirectly by comparing this equation with the gradient of $$g$$ numerically calculated and that of $$f$$ computed by backprop. If $$r$$ is chosen from uniform distribution, we can conclude with high probability that the gradient of $$f$$ itself is correct.

If the function is non-differentiable with respect to some input objects, we can check its backprop to such objects by no_grads argument. gradient_check computes numerical backward to inputs that correspond to False in no_grads. It also asserts that the backprop leaves gradients None for inputs that correspond to True in no_grads. The default of no_grads argument is the tuple of truth values whether input objects (x1_data or/and x2_data in this example) represent integer variables.

You can simplify a test when MyFunc gets only one argument:

check_backward(func, x1_data, gy_data)


If MyFunc is a loss function which returns a zero-dimensional array, pass None to gy_data. In this case, it sets 1 to grad attribute of the result:

check_backward(my_loss_func,
(x1_data, x2_data), None)


If MyFunc returns multiple outputs, pass all gradients for outputs as a tuple:

gy1_data = xp.array(...)
gy2_data = xp.array(...)
check_backward(func, x1_data, (gy1_data, gy2_data))


You can also test a Link. To check gradients of parameters of the link, set a tuple of the parameters to params arguments:

check_backward(my_link, (x1_data, x2_data), gy_data,


Note that params are not ndarrays, but Variabless.

Function objects are acceptable as func argument:

check_backward(lambda x1, x2: f(x1, x2),
(x1_data, x2_data), gy_data)


Note

func is called many times to get numerical gradients for all inputs. This function doesn’t work correctly when func behaves randomly as it gets different gradients.

Parameters
• func (callable) – A function which gets Variables and returns Variables. func must returns a tuple of Variables or one Variable. You can use a Function, FunctionNode or a Link object or any other function satisfying the condition.

• x_data (ndarray or tuple of ndarrays) – A set of ndarrays to be passed to func. If x_data is one ndarray object, it is treated as (x_data,).

• y_grad (ndarray or tuple of ndarrays or None) – A set of ndarrays representing gradients of return-values of func. If y_grad is one ndarray object, it is treated as (y_grad,). If func is a loss-function, y_grad should be set to None.

• params (Variable or tuple of ~chainder.Variable) – A set of Variables whose gradients are checked. When func is a Link object, set its parameters as params. If params is one Variable object, it is treated as (params,).

• eps (float) – Epsilon value to be passed to numerical_grad().

• atol (float) – Absolute tolerance to be passed to chainer.testing.assert_allclose().

• rtol (float) – Relative tolerance to be passed to chainer.testing.assert_allclose().

• no_grads (list of bool) – Flag to skip variable for gradient assertion. It should be same length as x_data.

• dtype (dtype) – x_data, y_grad and params are casted to this dtype when calculating numerical gradients. Only float types and None are allowed.

• detect_nondifferentiable (bool) – If True, check for non-differentiable inputs is enabled. If func is non-differentiable at x_data, check_backward raises NondifferentiableError.