Upgrade Guide from v1 to v2¶
This documentation provides detailed information of differences between Chainer v1 and v2. You will know by reading it which part of your code is required (or recommended) to be fixed when you upgrade Chainer from v1 to v2.
CuPy, which was originally a part of Chainer, has been separated into a different Python package since Chainer v2.
It changes the way to set up Chainer with CUDA support.
In particular, you have to separately install
cupy package to enable CUDA support.
See Installation for the recommended installation steps.
Fortunately, there is no need of updating your source code to catch up with this change.
In Chainer v2, the concept of training mode is added.
It is represented by a thread-local flag
chainer.config.train, which is a part of the unified configuration.
True, functions of Chainer run in the training mode, and otherwise they run in the test mode.
dropout() behave differently in each mode.
In Chainer v1, such a behavior was configured by the
test argument of each function.
This train/test argument has been removed in Chainer v2.
If your code is using the
test argument, you have to update it.
In most cases, what you have to do is just removing the
test argument from any function calls.
Consider the following model definition and the code to call it in test mode written for Chainer v1.
# Chainer v1 import chainer.functions as F class MyModel(chainer.Link): ... def __call__(self, x, train=True): return f(F.dropout(x, train=train)) m = MyModel(...) y = m(x, train=False)
In Chainer v2, it should be updated into the following code:
# Chainer v2 import chainer.functions as F class MyModel(chainer.Link): ... def __call__(self, x): return f(F.dropout(x)) m = MyModel(...) with chainer.using_config('train', False): y = m(x)
There are many global settings moved to the unified configuration other than the training mode. Following is the complete list of the configuration entries that have corresponding features in Chainer v1.
It is corresponding to the
deterministicargument of some convolution functions in Chainer v1. This argument has been removed since Chainer v2. If you are using this argument, you have to use the
chainer.config.cudnn_deterministicflag to change the behavior of the convolution functions.
It is corresponding to the debug mode in Chainer v1, which was configured by
set_debug()and extracted by
is_debug(). These functions are also available in Chainer v2, so you basically do not need to update the code related to the debug mode.
It is corresponding to the backprop mode in Chainer v1. The functions
force_backprop_mode()are still available in Chainer v2, which automatically turns on/off the
enable_backpropflag. One important difference from Chainer v1 is that the
volatileflag is removed from
Variable. Therefore, there are more situations that you need to modify the
This flag configures whether or not to keep the computational graph alive for a reported variable. In Chainer v2, when a
Variableobject is reported by
report(), a copy of the variable isolated from the computational graph is created and stored by default. Setting
Trueto this flag, you can change this behavior and then the original
Variableobject is stored as is. See When a variable is reported, the variable is copied with the graph purged for the details.
It is corresponding to the
testargument of some functions in Chainer v1. This argument has been removed since Chainer v2. If you are using this argument, you have to use the
chainer.config.trainflag instead. See Training mode is configured by a thread-local flag for more details.
It is corresponding to the
Function.type_check_enableflag. If your code touches this flag, you have to use
chainer.config.type_checkinstead. Note that the environment variable
CHAINER_TYPE_CHECKis still available in Chainer v2, so if you are only using the environment variable, there is no need of updating your code.
It is corresponding to the
use_cudnnargument of many functions that have cuDNN implementations. This argument has been removed since Chainer v2. If you are using this argument, you have to use the
chainer.config.use_cudnnflag instead. Note that this flag is ternary, not binary. See Configuring Chainer for more details.
These configurations can be modified in two ways.
Simply substituting a new value to an entry, like
chainer.config.train = False.
chainer.using_configcontext manager. It can be used with the
withstatement of Python as follows:
with chainer.using_config('train', False): do something # this code runs with chainer.config.train == False
It recovers the original configuration after quitting the
chainer.config manages the thread-local configuration.
You can also set the global configuration by modifying
Note that the global configuration is used only if the entry of the thread-local configuration is not explicitly set up.
Variable.volatile flag has been removed since Chainer v2.
Instead, the configuration
chainer.config.enable_backprop can be used to enable/disable the automatic differentiation feature.
If it is
True, Chainer always creates a computational graph on the forward propagation, which corresponds to passing non-volatile variables in Chainer v1.
Otherwise, Chainer does not create a graph, which corresponds to passing volatile variables in Chainer v1.
The biggest difference is that
enable_backprop is a thread-local flag, whereas
volatile was a flag local to each
enable_backprop flag has already existed in Chainer v1, which took effect only if all the inputs to the function have
volatile == 'auto'.
chainer.config.enable_backprop flag can be modified directly or by using
See Configuring Chainer for details.
There is also a convenience function,
no_backprop_mode(), to turn off the flag.
If you are using the
Variable.volatile flag, you have to stop setting this flag (it will not take effect), and set the
enable_backprop flag instead.
model be your model, and consider the following code that calls it in volatile mode.
# Chainer v1 x_data = ... # ndarray x = chainer.Variable(x_data, volatile=True) y = model(x)
In Chainer v2, it should be updated as follows.
# Chainer v2 x_data = ... # ndarray x = chainer.Variable(x_data) with chainer.no_backprop_mode(): y = model(x)
Variable class has been separated into two distinct classes, the
Variable class and the
VariableNode class, since Chainer v2.
Variable object owns its own
A computational graph consists of
Function objects and
When one applies a
Function to a
VariableNode object of the variable is extracted and set to one of the inputs of the function.
Note that the underlying data array of the variable is still held by the
It allows each
Function implementation to release unneeded arrays from the computational graph, resulting in greatly reduced memory consumption.
This change does not affect most users’ code.
If you are directly traversing the computational graph by yourself or modifying the graph ad-hoc, you may have to update your code.
In most cases, it is enough to just change
VariableNode in the code traversing the computational graph.
You basically do not need to update your code because
Link.add_param() creates a
Parameter object in Chainer v2.
There is a new recommended way of registering parameters to a link in Chainer v2, though.
See here for the recommended way of parameter registration.
There are some changes on the interface and specification of methods.
len(variable)returns the length of the first axis of the underlying array in Chainer v2. This is equivalent to
len(variable.data). It is different from the behavior of Chainer v1, in which
lenreturned the total number of elements in the underlying array.
repr(variable)returns a NumPy-like text representation of the underlying array in Chainer v2. In Chainer v1, it just returns a string that shows the name of the variable.
In Chainer v2, the
force_tuple argument of
functions.split_axis() is set to
True by default.
Therefore, it always returns a tuple regardless of the number of sections made after the split.
False by default in Chainer v1.
In Chainer v2, the type check APIs are updated so that the overhead of checking types is greatly reduced. In order to achieve the overhead reduction, some APIs are changed.
If you have custom Function implementations that do type checking, you have to update your code. The following list shows which part has to be updated.
utils.type_check.make_variable()to create a
utils.type_check.Variableobject instead of directly constructing it by yourself.
.nameattribute of any expression.
Background of this change:
In Chainer v1, the type checking APIs build an abstract syntax tree (AST) based on each expression that tests some condition.
The AST is used to emit a kind error message.
However, building an AST requires constructions of many Python objects, which adds large Python overheads.
In Chainer v2, the
Function.type_check_forward() method is called once or twice.
At the first call, the type checking APIs run in light-weight mode, where it does not build an AST and just checks the condition.
The second call is made only if there is a test that fails, where it builds an AST.
This change makes the ordinary path of running the type checking much faster, while keeping the kind error messages.
As is written above, Chainer v2 introduced a new mechanism to reduce the memory consumption of each
In many cases, a
Function implementation does not need some input arrays in its backward computation.
A new method called
Function.retain_inputs() can be used to specify which input arrays are actually needed.
This method must not be called from the outside of
For example, consider the following simple addition function.
class AddFunction(chainer.Function): def forward(self, inputs): return inputs + inputs, def backward(self, inputs, grad_outputs): return grad_outputs, grad_outputs
It can be seen that the backward computation of this function does not use any of the inputs.
Then, specifying an empty tuple of indexes to
retain_inputs() will reduce the memory overhead.
class AddFunction(chainer.Function): def forward(self, inputs): self.retain_inputs(()) # does not retain both inputs return inputs + inputs, def backward(self, inputs, grad_outputs): return grad_outputs, grad_outputs
In some cases, the function can (or have to) use the output arrays instead of the inputs in its backward computation.
In Chainer v1, we have written code that store the output arrays to attributes of the
Function object and reuse them in the
In Chainer v2, it is recommended that you use
Function.retain_outputs() to declare which outputs are required in the backward computation.
The retained output arrays can be accessed via
Function implementations that store the output arrays to its attributes will run correctly in Chainer v2.
There is no any memory overhead right now.
It is recommended that you use
retain_outputs(), though, so that we can incorporate more memory optimization in the future.
For example, consider the following simple implementation of the tanh function.
class TanhFunction(chainer.Function): def forward(self, inputs): xp = chainer.cuda.get_array_module(inputs) self.y = xp.tanh(inputs) return self.y, def backward(self, inputs, grad_outputs): one = self.y.dtype.type(1) # avoid type promotion return grad_outputs * (one - self.y * self.y),
We can use
retain_outputs() instead of preserving the output array by ourselves as follows.
class TanhFunction(chainer.Function): def forward(self, inputs): self.retain_outputs((0,)) xp = chainer.cuda.get_array_module(inputs) return xp.tanh(inputs), def backward(self, inputs, grad_outputs): y = self.output_data one = y.dtype.type(1) # avoid type promotion return grad_outputs * (one - y * y)
The following methods are removed from
These methods have been already deprecated in the past versions.
If you are using these methods, you have to update your code.
In Chainer v2, the new class
UpdateRule is used to define an update rule specific to each
UpdateRule is set to each
Parameter object, and is used at each update step.
This object implements an update formula using the data and gradient arrays.
UpdateRule object has
enabled flag, which configures if the update rule should be applied to that parameter on update.
By setting the flag to
False, you can freeze the parameter.
There is also a convenient method
Link.disable_update(), which configure the flag of each parameter under the link hierarchy.
In other frameworks, a similar feature is called layer freezing.
In Chainer v2, this is officially supported by these methods.
In most cases, you do not have to update your code because each optimizer automatically sets up an appropriate
UpdaterRule object to each parameter.
If you are using a custom gradient-based optimizer implementation, you need to update the implementation. The following list shows what you have to do.
Write a subclass of
UpdateRulethat implements the update rule.
GradientMethodimplementation. The new implementation only has to set up the update rule for each parameter in the target link.
You can see live examples in the optimizer implementations provided by Chainer.
In Chainer v2, all serializers start supporting
None value to be serialized and deserialized.
Users’ code can rely on this feature, i.e., it can serialize and deserialize
None value with any given serializer.
This change only affects your code if it provides its own serializer implementations.
In Chainer v2,
Evaluator pass raw data arrays to the loss function without wrapping them with
You might need to update your code so that the loss function (in most cases, the model’s
__call__ ) accepts raw arrays.
Note that raw arrays can be directly passed to any
Function; they are automatically wrapped by
For example, if the input is directly passed to a
Function object (or any function under
chainer.functions), you do not need to update the code.
Consider the following code that obtains the shape of the input via
# Chainer v1 class MyLink(chainer.Link): def __call__(self, x): shape = x.data.shape # valid if x is Variable, invalid if x is ndarray ...
It should be updated so that the link also accepts a raw array as the input.
In this case, we have
Variable.shape which is equivalent to
data.shape, so you can simply write as follows.
# Chainer v2 class MyLink(chainer.Link): def __call__(self, x): shape = x.shape # valid regardless of x being Variable or ndarray ...
In Chainer v2, the
trigger option is removed from the
The effect of the option was duplicated with the
trigger option of
If you are passing the
trigger argument to these extensions, you have to update your code.
The update can be done by passing the value to the corresponding
# Chainer v1 trainer.extend(chainer.training.extensions.snapshot(trigger=(1000, 'iteration')))
It should be updated as follows (note that this code also works with Chainer v1).
# Chainer v1/v2 trainer.extend(chainer.training.extensions.snapshot(), trigger=(1000, 'iteration'))
In Chainer v1, the extension is just called before entering the training loop when
If you have a custom extension that has
invoke_before_training=True , you have to update the code.
What you have to do is to remove the
invoke_before_training flag and override
If you are using the
make_extension() decorator, you can set the
initialize function by passing the
initializer argument to
In Chainer v2, the
dump_graph() extension dumps the valid computational graph only at its first invocation.
If you want to dump the graph more than once, you have to fix the code.
The easiest fix is setting the
chainer.config.keep_graph_on_report flag to
Note that this fix will cancel the improvement on the memory consumption made in Chainer v2.
More memory-efficient fix is to dump the graph without using an extension, e.g. by customizing the loss function or the updater.
Here is the background of this change.
In Chainer v2, the Reporter copies reported variables with purging the computational graph by default.
On the other hand, the
dump_graph() extension requires the computational graph reachable from the reported variable.
In order to make the graph available, the
dump_graph() extension turns on the
chainer.config.keep_graph_on_report flag at its initializer (i.e., it turns on the graph before entering the training loop).
Since we also wanted to achieve the memory efficiency, the
dump_graph() extension turns off the flag after dumping the graph at its first invocation (strictly speaking, it recovers the original value).
As a result, the computational graph is not available from the second invocation.
dump_graph() recovers the original flag value at its invocation, you can keep the graph dumped more than once by changing the original flag value.
In Chainer v2, when a
Variable object is reported using
report() function (or directly using
Reporter), a copy of the variable is made without preserving the computational graph.
If your code depends on the reachability of the computational graph from the reported variable, you have to update your code.
The easiest way to update your code is setting
True, then Chainer will keep the computational graph reachable from the reported variable.
The possible examples that are affected by this change are as follows (not exhaustive).
A custom extension that runs backprop from a reported variable. It is definitely an example of assuming the reachability of the computational graph from the reported variable.
An extension that visualizes the computational graph from a reported variable. If you are writing such an extension by yourself, you have to turn on the
dump_graph()extension is another example, for which see the above item for the details.
This change is made for the memory performance reason; with this change, the memory used by the computational graph for training is immediately released before invoking extensions.
Therefore, changing the behavior by overwriting
chainer.config.keep_graph_on_report may increase the memory consumption.
It may cause an out-of-memory error if the computational graph of the loss function consumes almost all the memory available in your environment and there is an extension that uses a certain amount of memory (e.g.
The following classes and functions are removed in Chainer v2.
chainer.cuda.init(It did nothing except for calling