chainer.Variable¶

class
chainer.
Variable
(data=None, *, name=None, grad=None, requires_grad=True)[source]¶ Array with a structure to keep track of computation.
Every variable holds a data array of type either
numpy.ndarray
orcupy.ndarray
.A variable object holds a data array and a
VariableNode
object of a computational graph. If the variable is constructed by the user, the node is root and does not hold any parent. If the variable is constructed by aFunctionNode
object (i.e., by calling functions underchainer.functions
or userdefined functions), or by using operators (see the list below), the node holds a reference to its parent calledcreator_node
. This reference is used in backpropagation to backtrack the graph.Users can disable (resp. enable) this chaining behavior by calling
no_backprop_mode()
(resp.force_backprop_mode()
). In the former context, a variable never creates a computational graph, whereas in the latter context, it is forced to create.Note
The following operators are defined for variable(s).
 Indexing:
a[slices]
(__getitem__()
)  Addition:
a + b
(__add__()
,__radd__()
)  Subtraction:
a  b
(__sub__()
,__rsub__()
)  Multiplication:
a * b
(__mul__()
,__rmul__()
)  Division:
a / b
(__div__()
,__rdiv__()
,__truediv__()
,__rtruediv__()
)  Floor Division:
a // b
(__floordiv__()
,__rfloordiv__()
)  Exponentiation:
a ** b
(__pow__()
,__rpow__()
)  Matrix Multiplication:
a @ b
(__matmul__()
,__rmatmul__()
)  Negation (Arithmetic):
 a
(__neg__()
)  Absolute value:
abs(a)
(__abs__()
)
Warning
volatile
argument is not supported anymore since v2. Instead, usechainer.no_backprop_mode()
.Parameters:  data (Ndimensional array) – Initial data array.
 name (str) – Name of the variable.
 grad (Ndimensional array) – Initial gradient array.
 requires_grad (bool) – Boolean indicating whether
grad
will be set in backward calculation.
Methods

__getitem__
(slices)[source]¶ Extract elements from array with specified shape, axes and offsets.
Parameters:  x (
Variable
or Ndimensional array) – A variable to be sliced.  slices (int, slice, Ellipsis, None, integer arraylike, boolean arraylike or tuple of them) – An object to specify the selection of elements.
Returns: A
Variable
object which contains sliced array ofx
.Note
It only supports types that are supported by CUDA’s atomicAdd when an integer array is included in
slices
. The supported types arenumpy.float32
,numpy.int32
,numpy.uint32
,numpy.uint64
andnumpy.ulonglong
.Note
It does not support
slices
that contains multiple boolean arrays.Note
See NumPy documentation for details of indexing.
Example
>>> x = np.arange(12).reshape((2, 2, 3)) >>> x array([[[ 0, 1, 2], [ 3, 4, 5]], <BLANKLINE> [[ 6, 7, 8], [ 9, 10, 11]]]) >>> F.get_item(x, 0) variable([[0, 1, 2], [3, 4, 5]]) >>> F.get_item(x, (0, 0, slice(0, 2, 1))) # equals x[0, 0, 0:2:1] variable([0, 1]) >>> F.get_item(x, (Ellipsis, 2)) # equals x[..., 2] variable([[ 2, 5], [ 8, 11]]) >>> F.get_item(x, (1, np.newaxis, 1, 0)) # equals x[1, None, 1, 0] variable([9])
 x (

__len__
()[source]¶ Returns the first dimension of the data array.
Returns: Number of the first dimension of the data array. Return type: int

addgrad
(var)[source]¶ Accumulates the gradient array from given source variable.
This method adds the gradient of a given variable to the gradient of this variable. The accumulation is even done across the host and different devices. If this variable has uninitialized data/grad arrays, this method initializes it with the shape of the given variable and then accumulates the gradient.
Parameters: var (Variable) – Source variable.

backward
(retain_grad=False, enable_double_backprop=False, loss_scale=None)[source]¶ Runs error backpropagation (a.k.a. backprop) from this variable.
On backprop,
FunctionNode.backward()
is called on eachFunctionNode
object appearing in the backward graph starting from this variable. The backward graph is represented by backward references from variable nodes to their creators, and from function nodes to their input variable nodes. The backprop stops at all root nodes. Some function nodes setNone
as gradients of some inputs, where further backprop does not take place at such inputs.This method uses
grad
as the initial error array. User can manually set a gradient array before calling this method. If the shape ofdata
is()
(i.e., it is scalar) andgrad
isNone
, then this method automatically complements 1.0 as the initial error. This is useful on starting backprop from some scalar loss value.From v3, this method supports differentiable backprop (a.k.a. double backprop, grad of grads). To enable it, pass
enable_double_backprop=True
.Parameters:  retain_grad (bool) –
If
True
, the gradient arrays of all intermediate variables are kept. Otherwise,grad
of the intermediate variables are set toNone
on appropriate timing, which may reduce the maximum memory consumption.In most cases of training some models, the purpose of backprop is to compute gradients of parameters, not of all variables, and therefore it is recommended to set this flag
False
.  enable_double_backprop (bool) – (Added in v3.0) If
True
, computational trace of the whole backpropagation procedure is recorded to the computational graph so that one can further do backpropagation from the resulting gradients. Note that enabling it results in larger memory consumption needed to store the gradients w.r.t intermediate variables that are required for the second gradient computation.  loss_scale (float) – Loss scaling factor. Loss scaling is a usefull technique to mitigate vanishing gradient issue that tends to happen when low precision data type like float16 is used during training. If you set loss scaling factor, gradients of loss values are to be multiplied by the factor before backprop starts. The factor is propagated to whole gradients in a computational graph along the backprop. The gradients of parameters are divided by the factor just before the parameters are to be updated.
 retain_grad (bool) –

copydata
(var)[source]¶ Copies the data array from given source variable.
This method copies the data array from given variable to this variable. The copy is done even if the arrays reside on different devices, including across the host and a GPU device. If this variable has an uninitialized data array, this method initializes it by the data array of the given variable. Similarly, if the given variable has an uninitialized data array, this method initializes it by the data array of this variable (
self
). If both are uninitialized, this method does nothing.Parameters: var (Variable) – Source variable.

reshape
(*shape)[source]¶ Returns a variable of a different shape and the same content.
See also
chainer.functions.reshape()
for full documentation,

set_creator
(gen_func)[source]¶ Notifies the variable that the given function is its creator.
Parameters: gen_func (Function) – Function object that creates this variable as one of its outputs.

set_creator_node
(fnode)[source]¶ Notifies the variable that the given node is its creator.
Parameters: fnode (FunctionNode) – Function node that has this variable as an output.

to_gpu
(device=None)[source]¶ Copies the data and gradient arrays to specified GPU.
Parameters: device – Target device specifier. If omitted, the current device is used.

to_intel64
()[source]¶ Copies the data and gradient arrays to intel64 specific mdarray.
If the array is not suited for intel64, it will be converted to
numpy.ndarray
.

transpose
(*axes)[source]¶ Permute the dimensions of an input variable without copy.
See also
chainer.functions.transpose()
for full documentation.

unchain
()[source]¶ Deletes the reference to the creator of this variable.
This method deletes the reference to the creator from the corresponding variable node. Unlike
unchain_backward()
, it does not backtrack the graph.This method is equivalent to
self.creator_node = None
.

unchain_backward
()[source]¶ Deletes references between variable nodes and functions backward.
After this method completes, intermediate variable nodes and functions that are not referenced from anywhere are deallocated by reference count GC. Also this variable itself deletes the reference to its creator function from the node, i.e. the node becomes root in the computation graph. It indicates that backprop after unchaining stops at this variable. This behavior is useful to implement truncated BPTT.

zerograd
()[source]¶ Initializes the gradient array by zeros.
Note that the gradient variable is unchained from the computational graph by this method because this operation breaks the backprop validity.
Deprecated since version v1.15: Use
cleargrad()
instead.

__floordiv__
(rhs)[source]¶ Elementwise floor division.
Returns: Output variable. Return type: Variable

__rfloordiv__
(rhs)[source]¶ Elementwise floor division.
Returns: Output variable. Return type: Variable
Attributes

T
¶ Transposition of this variable.

array
¶ The underlying data array.
It is either
numpy.ndarray
orcupy.ndarray
object, orNone
if the variable in in an uninitialized state.

creator
¶ Function implementation that created this variable.
When this variable has been created by an oldstyle function (i.e., it is implemented as a subclass of
Function
), this property returns thatFunction
object.When this variable has been created by a newstyle function (i.e., it is implemented as a subclass of
FunctionNode
class), this property returns that node object.

creator_node
¶ FunctionNode
object that created this variable.This property has a setter to which
None
can be set. SettingNone
to this property is equivalent to callunchain()
; it purges the variable from the function that created this variable.The setter also accepts the original
FunctionNode
object that created this variable. For example, you can once setNone
to this property and then set the original value again.Note
Setting an irrelevant
FunctionNode()
object does not emit any error immediately, whereas the behavior is undefined. Do not set aFunctionNode()
object that did not create this variable object.

data
¶ The underlying data array (equivalent to
array
).Note that using this attribute directly is discouraged; use
array
instead. Usingarray
, you can find an error earlier when your code mixes up Variable and ndarray because ndarray does not have an attribute.array
while it has.data
.

dtype
¶

grad
¶ Gradient array of this variable.
Note that this property returns the underlying array of the gradient variable instead of the gradient variable itself; to get/set gradient variable, use
grad_var
instead.

grad_var
¶ Gradient variable.

label
¶ Short text that represents the variable.

name
¶

ndim
¶

node
¶

rank
¶

requires_grad
¶ It indicates that
grad
will be set in backward calculation.

shape
¶

size
¶
 Indexing: