Decorator to mark a Chain’s
__call__()as a static sub-graph.
This decorator marks the define-by-run code inside the __call__() method of a Chain instance as corresponding to a static computation graph or sub-graph. Such a chain will be referred to as a ‘static chain’. This allows various “static graph” optimizations to be performed, which can result in significant speedups for some models.
When this decorator is used, the chain’s define-by-run code executes during the first iteration as usual. However, while the define-by-run code is executing, a trace is also performed to incrementally create a corresponding static schedule. This static schedule will only contain the subset of the computations inside the define-by-run code that actually needs to run every iteration. Specifically, this will contain the code inside any functions called that were annotated with the @static_code decorator, which will include all Chainer built-in functions, as well as any user-defined functions that use @static_code. Then, starting from the second iteration, when the static chain is called, its static schedule code will be executed instead of its define-by-run code.
However, the user must also be careful of the following: - The user is responsible for applying this decorator correctly. The framework does not check that the define-by-run code corresponds to a static graph. The graph can be different between training and evaluation mode (such as when dropout and/or batch normalization are used), but should otherwise be static. - When chainer.config.enable_backprop is enabled, if a backward pass is not performed each iteration, then the user code must call a method chain.schedule_manager.end_forward()`on the static chain each iteration. - Static graphs allow tradeoffs between computation and memory usage. For example, the `minimize_cache_size argument will typically result in higher memory useage when set to False because all cached schedules are retained. - When this feature is enabled, only the Chainer function and/or link calls inside the chain’s __call__() method will be included in the static schedule by default. An other code that the user puts in __call__(), such as a print statement or code to increment a counter for example, will not automatically get added. We will refer to such code other than Chainer function/link calls as “side-effect” code. Since side-effect code does not get included in the static schedule by default, this means that it will only every execute once, during the first iteration. There is a way to force side-effect code to be included in the static schedule, however: the user can wrapp such code inside a function that is decorated with @static_code to ensure that it gets added to the static schedule. For an example of this, refer to the documentation. - This feature is experimental and advanced optimizations such as kernel fusion and various memory optimizations are not implemented yet.
This decorator should only be applied to define-by-run code that actually corresponds to a static subgraph. Refer to the documenation for additional details and examples of correct usage. This decorator should be applied to each of the largest static subgraphs in the model; it can also be applied to a static subgraph that is not the largest subgraph, but that could result in reduced performance. It is not currently allowed to mark a chain as static if it is contained within another chain that is also marked as being static. For example, suppose a static graph A contains a static sub-graph B. Then, only the chain corresponding to A should be marked as static and the chain corresponding to B should not be marked as static.
The behavior of a static chain depends on the training mode flag, chainer.config.train. If it is True, then a static chain that is called multiple times will try to use a distinct static schedule object (that is, call a distinct instance of a FunctionNode that implements that static schedule) on each call. The same schedule instance cannot be reused until the forward pass has completed, which is signaled by performing a backward pass through the model. It is therefore important that the backward pass be performed after each forward pass during training. Since this is usually the case, most usages of static chain will not required any modifications to existing code other than applying this decorator. However, if you would like to perform multiple forward passes during training before performing a backward pass, then you must call chain.schedule_manager.end_forward() after the end of each forward pass.
If test mode is active (chainer.config.train is False) then it is not necessary to inform the chain at the end of each forward pass because in test mode, a static chain always attempts to reuse existing static schedule objects. The same static schedule can be reused during a single forward pass, because it is not necessary to compute gradients. It is also possible to disable static optimzations while in test mode by setting the decorator argument force_test_define_by_run=True.
Note: If either ‘chainer.config.enable_backprop’ or ‘chainer.config.train’ is set to ‘False’, then cached static schedules will be reused when possible to reduce memory usage.
- Double-backpropagation is not enabled by default. It can be enabled by
supplying the keyword argument
enable_double_backprop=Trueto this decorator. Note: this feature has not been tested yet.
- Restrictions on input arguments and return values of a static chain:
- Recall that unlike a function, there is no restrictions on the arguments to a chain. However, there currently are some restrictions when a static chain is used. Specifically, the arguments to a static chain must consist of a variable, list or tuple. In the case of a list or tuple, the elements are required to be an instance of variable, list, or tuple. There can be an arbitrary number of nested lists/ tuples. No other object types are allowed. In addition, keyword arguments are not allowed. The return value of a static chain must be a variable, list, or tuple in which each element of the list or tuple is also a variable, list, or tuple.
This decorator can be supplied with the following optional keyword arguments. This is an experimental feature, and the API and arguments might change
- force_test_define_by_run (bool) – If True, disable static graph optimizations during test mode (that is, when chainer.config.train is False). This may be needed in order for some existing RNN links such as LSTM to work correctly, since some existing links do not correspond to a static graph in some cases. The default is False.
- minimize_cache_size (bool) – If True, minimize the number of cached static schedules in order to reduce memory usage. For example, if the mini-batch size changes or the training mode changes, the schedules will need to be recomputed, but memory is also saved by not retaining all cached schedules. The default value is True.
- verbosity_level (int) – Depending on the value, print additional information: 0: Warnings only. (the default value) 1: Show only information that is collected during the first iteration and when a new static schedule is created. 2: Detailed debugging information, possibly showing new information every iteration.
- enable_double_backprop (bool) – If True, enable double-backprop. The default value is False (not enabled).
__call__()method with static chain support.