# Upgrade Guide from v1 to v2¶

This document provides detailed information of differences between Chainer v1 and v2. You will know by reading it which part of your code is required (or recommended) to be fixed when you upgrade Chainer from v1 to v2.

## CuPy¶

### CuPy has been separated from Chainer into a separate package¶

CuPy, which was originally a part of Chainer, has been separated into a different Python package since Chainer v2. It changes the way to set up Chainer with CUDA support. In particular, you have to separately install cupy package to enable CUDA support. See Installation Guide for the recommended installation steps.

Fortunately, there is no need of updating your source code to catch up with this change.

## Global configurations¶

### Training mode is configured by a thread-local flag¶

In Chainer v2, the concept of training mode is added. It is represented by a thread-local flag chainer.config.train, which is a part of the unified configuration. When chainer.config.train is True, functions of Chainer run in the training mode, and otherwise they run in the test mode. For example, BatchNormalization and dropout() behave differently in each mode.

In Chainer v1, such a behavior was configured by the train or test argument of each function. This train/test argument has been removed in Chainer v2. If your code is using the train or test argument, you have to update it. In most cases, what you have to do is just removing the train / test argument from any function calls.

Example

Consider the following model definition and the code to call it in test mode written for Chainer v1.

# Chainer v1
import chainer.functions as F

...

def __call__(self, x, train=True):
return f(F.dropout(x, train=train))

m = MyModel(...)
y = m(x, train=False)


In Chainer v2, it should be updated into the following code:

# Chainer v2
import chainer.functions as F

...

def __call__(self, x):
return f(F.dropout(x))

m = MyModel(...)
with chainer.using_config('train', False):
y = m(x)


### Configurations are added and replace some of existing global flags¶

There are many global settings moved to the unified configuration other than the training mode. Following is the complete list of the configuration entries that have corresponding features in Chainer v1.

chainer.config.cudnn_deterministic
It is corresponding to the deterministic argument of some convolution functions in Chainer v1. This argument has been removed since Chainer v2. If you are using this argument, you have to use the chainer.config.cudnn_deterministic flag to change the behavior of the convolution functions.
chainer.config.debug
It is corresponding to the debug mode in Chainer v1, which was configured by set_debug() and extracted by is_debug(). These functions are also available in Chainer v2, so you basically do not need to update the code related to the debug mode.
chainer.config.enable_backprop
It is corresponding to the backprop mode in Chainer v1. The functions no_backprop_mode() and force_backprop_mode() are still available in Chainer v2, which automatically turns on/off the enable_backprop flag. One important difference from Chainer v1 is that the volatile flag is removed from Variable. Therefore, there are more situations that you need to modify the enable_backprop flag.
chainer.config.keep_graph_on_report
This flag configures whether or not to keep the computational graph alive for a reported variable. In Chainer v2, when a Variable object is reported by report(), a copy of the variable isolated from the computational graph is created and stored by default. Setting True to this flag, you can change this behavior and then the original Variable object is stored as is. See When a variable is reported, the variable is copied with the graph purged for the details.
chainer.config.train
It is corresponding to the train or test argument of some functions in Chainer v1. This argument has been removed since Chainer v2. If you are using this argument, you have to use the chainer.config.train flag instead. See Training mode is configured by a thread-local flag for more details.
chainer.config.type_check
It is corresponding to the Function.type_check_enable flag. If your code touches this flag, you have to use chainer.config.type_check instead. Note that the environment variable CHAINER_TYPE_CHECK is still available in Chainer v2, so if you are only using the environment variable, there is no need of updating your code.
chainer.config.use_cudnn
It is corresponding to the use_cudnn argument of many functions that have cuDNN implementations. This argument has been removed since Chainer v2. If you are using this argument, you have to use the chainer.config.use_cudnn flag instead. Note that this flag is ternary, not binary. See Configuring Chainer for more details.

These configurations can be modified in two ways.

• Simply subsituting a new value to an entry, like chainer.config.train = False.

• Using the chainer.using_config context manager. It can be used with the with statement of Python as follows:

with chainer.using_config('train', False):
do something  # this code runs with chainer.config.train == False


It recovers the original configuration after quitting the with block.

The chainer.config manages the thread-local configuration. You can also set the global configuration by modifying chainer.global_config. Note that the global configuration is used only if the entry of the thread-local configuration is not explicitly set up.

## Variable¶

### Volatile flag is removed¶

The Variable.volatile flag has been removed since Chainer v2.

Instead, the configuration chainer.config.enable_backprop can be used to enable/disable the automatic differentiation feature. If it is True, Chainer always creates a computational graph on the forward propagation, which corresponds to passing non-volatile variables in Chainer v1. Otherwise, Chainer does not create a graph, which corresponds to passing volatile variables in Chainer v1. The biggest difference is that enable_backprop is a thread-local flag, whereas volatile was a flag local to each Variable object. Note that enable_backprop flag has already existed in Chainer v1, which took effect only if all the inputs to the function have volatile == 'auto'.

The chainer.config.enable_backprop flag can be modified directly or by using using_config(). See Configuring Chainer for details. There is also a convenience function, no_backprop_mode(), to turn off the flag.

If you are using the Variable.volatile flag, you have to stop setting this flag (it will not take effect), and set the enable_backprop flag instead.

Example

Let model be your model, and consider the following code that calls it in volatile mode.

# Chainer v1
x_data = ...   # ndarray
x = chainer.Variable(x_data, volatile=True)
y = model(x)


In Chainer v2, it should be updated as follows.

# Chainer v2
x_data = ...   # ndarray
x = chainer.Variable(x_data)
with chainer.no_backprop_mode():
y = model(x)


### Variable is not a part of a computational graph anymore¶

The Variable class has been separated into two distinct classes, the Variable class and the VariableNode class, since Chainer v2. Every class:Variable object owns its own VariableNode object. A computational graph consists of Function objects and VariableNode objects. When one applies a Function to a Variable, the VariableNode object of the variable is extracted and set to one of the inputs of the function.

Note that the underlying data array of the variable is till held by the Variable object. It allows each Function implementation to release unneeded arrays from the computational graph, resulting in greatly reduced memory consumption.

This change does not affect most users’ code. If you are directly traversing the computational graph by yourself or modifying the graph ad-hoc, you may have to update your code. In most cases, it is enough to just change Variable into VariableNode in the code traversing the computational graph.

### Parameter has to be an instance of Parameter class¶

Chainer v2 has a subclass of Variable called Parameter. This class has an interface convenient on setting up a parameter variable registered to Link.

You basically do not need to update your code because Link.add_param() creates a Parameter object in Chainer v2. There is a new recommended way of registering parameters to a link in Chainer v2, though. See here for the recommended way of parameter registration.

### Small changes to Variable¶

There are some changes on the interface and specification of methods.

• len(variable) returns the length of the first axis of the underlying array in Chainer v2. This is equivalent to len(variable.data). It is different from the behavior of Chainer v1, in which len returned the total number of elements in the underlying array.
• repr(variable) returns a NumPy-like text representation of the underlying array in Chainer v2. In Chainer v1, it just returns a string that shows the name of the variable.

## Function¶

### The force_tuple option of split_axis is True by default¶

In Chainer v2, the force_tuple argument of functions.split_axis() is set to True by default. Therefore, it always returns a tuple regardless of the number of sections made after the split. It was False by default in Chainer v1.

### Type check APIs are updated to enable lazy building of the error messages¶

In Chainer v2, the type check APIs are updated so that the overhead of checking types is greatly reduced. In order to achieve the overhead reduction, some APIs are changed.

If you have custom Function implementations that do type checking, you have to update your code. The following list shows which part has to be updated.

• Use utils.type_check.eval() instead of Expr.eval.
• Use utils.type_check.make_variable() to create a utils.type_check.Variable object instead of directly constructing it by yourself.
• Stop using .name attribute of any expression.

Background of this change: In Chainer v1, the type checking APIs build an abstract syntax tree (AST) based on each expression that tests some condition. The AST is used to emit a kind error message. However, building an AST requires constructions of many Python objects, which adds large Python overheads. In Chainer v2, the Function.type_check_forward() method is called once or twice. At the first call, the type checking APIs run in light-weight mode, where it does not build an AST and just checks the condition. The second call is made only if there is a test that fails, where it builds an AST. This change makes the ordinary path of running the type checking much faster, while keeping the kindful error messages.

### Methods to release unneeded arrays are added¶

As is written above, Chainer v2 introduced a new mechanism to reduce the memory consumption of each Function implementation. In many cases, a Function implementation does not need some input arrays in its backward computation. A new method called Function.retain_inputs() can be used to specify which input arrays are actually needed. This method must not be called from the outside of Function.forward().

Example

For example, consider the following simple addition function.

class AddFunction(chainer.Function):
def forward(self, inputs):
return inputs[0] + inputs[1],



It can be seen that the backward computation of this function does not use any of the inputs. Then, specifying an empty tuple of indexes to retain_inputs() will reduce the memory overhead.

class AddFunction(chainer.Function):
def forward(self, inputs):
self.retain_inputs(())  # does not retain both inputs
return inputs[0] + inputs[1],



In some cases, the function can (or have to) use the output arrays instead of the inputs in its backward computation. In Chainer v1, we have written code that store the output arrays to attributes of the Function object and reuse them in the backward() method. In Chainer v2, it is recommended to use Function.retain_outputs() to declare which outputs are required in the backward computation. The retained output arrays can be accessed via Function.output_data.

Note

The existing Function implementations that store the output arrays to its attributes will run correctly in Chainer v2. There is no any memory overhead right now. It is recommended to use retain_outputs(), though, so that we can incorporate more memory optimization in the future.

Example

For example, consider the following simple implementation of the tanh function.

class TanhFunction(chainer.Function):
def forward(self, inputs):
xp = chainer.cuda.get_array_module(inputs[0])
self.y = xp.tanh(inputs[0])
return self.y,

one = self.y.dtype.type(1)  # avoid type promotion
return grad_outputs[0] * (one - self.y * self.y),


We can use retain_outputs() instead of preserving the output array by ourselves as follows.

class TanhFunction(chainer.Function):
def forward(self, inputs):
self.retain_outputs((0,))
xp = chainer.cuda.get_array_module(inputs[0])
return xp.tanh(inputs[0]),

y = self.output_data[0]
one = y.dtype.type(1)  # avoid type promotion
return grad_outputs[0] * (one - y * y)


## Optimizer¶

### Deprecated methods of Optimizer are removed¶

The following methods are removed from Optimizer. These methods have been already deprecated in the past versions. If you are using these methods, you have to update your code.

### GradientMethod is redesigned to allow parameter-specific update rules¶

In Chainer v2, the new class UpdateRule is used to define an update rule specific to each Parameter object. The UpdateRule is set to each Parameter object, and is used at each update step. This object implements an update formula using the data and gradient arrays.

Each UpdateRule object has enabled flag, which configures if the update rule should be applied to that parameter on update. By setting the flag to False, you can freeze the parameter. There is also a convenient method Link.enable_update() and Link.disable_update(), which configure the flag of each parameter under the link hierarchy. In other frameworks, a similar feature is called layer freezing. In Chainer v2, this is officially supported by these methods.

Each UpdateRule object can also hold its own hook functions similar to Optimizer. The built-in hook functions except for GradientClipping can also be used as a hook function of UpdateRule.

In most cases, you do not have to update your code because each optimizer automatically sets up an appropriate UpdaterRule object to each parameter.

If you are using a custom gradient-based optimizer implementation, you need to update the implementation. The following list shows what you have to do.

You can see live examples in the optimizer implementations provided by Chainer.

## Serializer¶

### None is serializable¶

In Chainer v2, all serializers start supporting None value to be serialized and deserialized. Users’ code can rely on this feature, i.e., it can serialize and deserialize None value with any given serializer. This change only affects your code if it provides its own serializer implementations.

## Trainer and Extension¶

### Updater and Evaluator pass raw data arrays to the loss function¶

In Chainer v2, Updater and Evaluator pass raw data arrays to the loss function without wrapping them with Variable. You might need to update your code so that the loss function (in most cases, the model’s __call__ ) accepts raw arrays.

Note that raw arrays can be directly passed to any Function; they are automatically wrapped by Variable. For example, if the input is directly passed to a Function object (or any function under chainer.functions), you do not need to update the code.

Example

Consider the following code that obtains the shape of the input via Variable.data.

# Chainer v1
def __call__(self, x):
shape = x.data.shape  # valid if x is Variable, invalid if x is ndarray
...


It should be updated so that the link also accepts a raw array as the input. In this case, we have Variable.shape which is equivalent to data.shape, so you can simply write as follows.

# Chainer v2
def __call__(self, x):
shape = x.shape  # valid regardless of x being Variable or ndarray
...


### trigger option is removed from snapshot and snapshot_object¶

In Chainer v2, the trigger option is removed from the snapshot() and snapshot_object() extensions. The effect of the option was duplicated with the trigger option of Trainer.extend. If you are passing the trigger argument to these extensions, you have to update your code. The update can be done by passing the value to the corresponding Trainer.extend.

Example

Assume that trainer is an instance of Trainer, and consider that you were adding a snapshot() extension as follows.

# Chainer v1
trainer.extend(chainer.training.extensions.snapshot(trigger=(1000, 'iteration')))


It should be updated as follows (note that this code also works with Chainer v1).

# Chainer v1/v2
trainer.extend(chainer.training.extensions.snapshot(), trigger=(1000, 'iteration'))


### Extension.invoke_before_training is removed¶

In Chainer v2, The attribute invoke_before_training of Extension is removed. Instead, the Extension.initialize method is added. This method is called by Trainer.run before entering the training loop.

In Chainer v1, the extension is just called before entering the training loop when invoke_before_training is True. If you have a custom extension that has invoke_before_training=True**, you have to update the code.** What you have to do is to remove the invoke_before_training flag and override initialize() method. If you are using the make_extension() decorator, you can set the initialize function by passing the initializer argument to make_extension().

### The dump_graph extension dumps the valid graph only at its first invocation¶

In Chainer v2, the dump_graph() extension dumps the valid computational graph only at its first invocation. If you want to dump the graph more than once, you have to fix the code. The easiest fix is setting the chainer.config.keep_graph_on_report flag to True. Note that this fix will cancel the improvement on the memory consumption made in Chainer v2. More memory-efficient fix is to dump the graph without using an extension, e.g. by customizing the loss function or the updater.

Here is the background of this change. In Chainer v2, the Reporter copies reported variables with purging the computational graph by default. On the other hand, the dump_graph() extension requires the computational graph reachable from the reported variable. In order to make the graph available, the dump_graph() extension turns on the chainer.config.keep_graph_on_report flag at its initializer (i.e., it turns on the graph before entering the training loop). Since we also wanted to achieve the memory efficiency, the dump_graph() extension turns off the flag after dumping the graph at its first invocation (strictly speaking, it recovers the original value). As a result, the computational graph is not available from the second invocation.

Since the dump_graph() recovers the original flag value at its invocation, you can keep the graph dumped more than once by changing the original flag value.

## Reporter¶

### When a variable is reported, the variable is copied with the graph purged¶

In Chainer v2, when a Variable object is reported using report() function (or directly using Reporter), a copy of the variable is made without preserving the computational graph. If your code depends on the reachability of the computational graph from the reported variable, you have to update your code. The easiest way to update your code is setting chainer.config.keep_graph_on_report to True, then Chainer will keep the computational graph reachable from the reported variable.

The possible examples that are affected by this change are as follows (not exhaustive).

• A custom extension that runs backprop from a reported variable. It is definitely an example of assuming the reachability of the computational graph from the reported variable.
• An extension that visualizes the computational graph from a reported variable. If you are writing such an extension by yourself, you have to turn on the keep_graph_on_report flag. The dump_graph() extension is another example, for which see the above item for the details.

This change is made for the memory performance reason; with this change, the memory used by the computational graph for training is immediately released before invoking extensions. Therefore, changing the behavior by overwriting chainer.config.keep_graph_on_report may increase the memory consumption. It may cause an out-of-memory error if the computational graph of the loss function consumes almost all the memory available in your environment and there is an extension that uses a certain amount of memory (e.g. Evaluator).

## Other utilities¶

### Some obsolete classes and functions are removed¶

The following classes and functions are removed in Chainer v2.

• chainer.Flag
• chainer.FunctionSet (Use Chain or ChainList instead)
• chainer.cuda.init (It did nothing except for calling check_cuda_available())
• chainer.cuda.empty (Use cupy.empty())
• chainer.cuda.empty_like (Use cupy.empty_like())
• chainer.cuda.full (Use cupy.full())
• chainer.cuda.full_like (Use cupy.full_like())
• chainer.cuda.ones (Use cupy.ones())
• chainer.cuda.ones_like (Use cupy.ones_like())
• chainer.cuda.zeros (Use cupy.zeros())
• chainer.cuda.zeros_like (Use cupy.zeros_like())