chainer.Parameter

class chainer.Parameter(initializer: Union[chainer.types.AbstractInitializer, numpy.generic, bytes, str, memoryview, numbers.Number, numpy.ndarray, None] = None, shape: Union[int, Sequence[int], None] = None, name: Optional[str] = None)[source]

Parameter variable that can be registered to a link.

Parameter is a subclass of Variable. It almost behaves as same as a usual variable except that a parameter can be registered to a Link object just by assigning it to an attribute of the link within an init_scope() context.

Parameter also supports an initialization by an initializer. It can have two initializers: one for the data array, and the other for the gradient array. The initializer only specifies the way of filling the elements of these arrays, and the shape information is specified at the initialization point.

When a link that the parameter has been registered to is passed to an GradientMethod, an update rule is set to the parameter. This update rule specifies how to update the data array of the parameter using its gradient array.

Parameters
  • initializer (~chainer.Initializer or N-dimensional array) – Initializer of the data array. If shape is given, this initializer is immediately used to initialize the data array. Otherwise, if it is an array, it is immediately used as the data array, and otherwise the data array is left uninitialized and will be initialized by this initializer in initialize(). It can also be a scalar, in which case the data array will be filled by this scalar. Note that float32 is used in this case.

  • shape (int or tuple of int or None) – Shape of the parameter. If it is None, the initialization is deferred to the call of initialize().

  • name (str) – Name of the parameter.

Variables
  • initializer – Initializer of the data array. It is used for initializing the data array of an uninitialized variable.

  • update_ruleUpdateRule instance that updates this variable as a parameter. This argument is set to update_rule.

Methods

__getitem__(slices)[source]

Extract elements from array with specified shape, axes and offsets.

Parameters
  • x (Variable or N-dimensional array) – A variable to be sliced.

  • slices (int, slice, Ellipsis, None, integer array-like, boolean array-like or tuple of them) – An object to specify the selection of elements.

Returns

A Variable object which contains sliced array of x.

Note

It only supports types that are supported by CUDA’s atomicAdd when an integer array is included in slices. The supported types are numpy.float32, numpy.int32, numpy.uint32, numpy.uint64 and numpy.ulonglong.

Note

It does not support slices that contains multiple boolean arrays.

Note

See NumPy documentation for details of indexing.

Example

>>> x = np.arange(12).reshape((2, 2, 3))
>>> x
array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])
>>> F.get_item(x, 0)
variable([[0, 1, 2],
          [3, 4, 5]])
>>> F.get_item(x, (0, 0, slice(0, 2, 1)))  # equals x[0, 0, 0:2:1]
variable([0, 1])
>>> F.get_item(x, (Ellipsis, 2))  # equals x[..., 2]
variable([[ 2,  5],
          [ 8, 11]])
>>> F.get_item(x, (1, np.newaxis, 1, 0))  # equals x[1, None, 1, 0]
variable([9])
__len__()[source]

Returns the first dimension of the data array.

Returns

Number of the first dimension of the data array.

Return type

int

__copy__()[source]
addgrad(var)[source]

Accumulates the gradient array from given source variable.

This method adds the gradient of a given variable to the gradient of this variable. The accumulation is even done across the host and different devices. If this variable has uninitialized data/grad arrays, this method initializes it with the shape of the given variable and then accumulates the gradient.

Parameters

var (Variable) – Source variable.

backward(retain_grad=False, enable_double_backprop=False, loss_scale=None)[source]

Runs error backpropagation (a.k.a. backprop) from this variable.

On backprop, FunctionNode.backward() is called on each FunctionNode object appearing in the backward graph starting from this variable. The backward graph is represented by backward references from variable nodes to their creators, and from function nodes to their input variable nodes. The backprop stops at all root nodes. Some function nodes set None as gradients of some inputs, where further backprop does not take place at such inputs.

This method uses grad as the initial error array. User can manually set a gradient array before calling this method. If the shape of data is () (i.e., it is scalar) and grad is None, then this method automatically complements 1.0 as the initial error. This is useful on starting backprop from some scalar loss value.

From v3, this method supports differentiable backprop (a.k.a. double backprop, grad of grads). To enable it, pass enable_double_backprop=True.

Parameters
  • retain_grad (bool) –

    If True, the gradient arrays of all intermediate variables are kept. Otherwise, grad of the intermediate variables are set to None on appropriate timing, which may reduce the maximum memory consumption.

    In most cases of training some models, the purpose of backprop is to compute gradients of parameters, not of all variables, and therefore it is recommended that this flag be set to False.

  • enable_double_backprop (bool) – (Added in v3.0) If True, computational trace of the whole backpropagation procedure is recorded to the computational graph so that one can further do backpropagation from the resulting gradients. Note that enabling it results in larger memory consumption needed to store the gradients w.r.t intermediate variables that are required for the second gradient computation.

  • loss_scale (float) – Loss scaling factor. Loss scaling is a useful technique to mitigate vanishing gradient issue that tends to happen when low precision data type like float16 is used during training. If you set loss scaling factor, gradients of loss values are to be multiplied by the factor before backprop starts. The factor is propagated to whole gradients in a computational graph along the backprop. The gradients of parameters are divided by the factor just before the parameters are to be updated.

cleargrad()[source]

Clears the gradient array.

copydata(var)[source]

Copies the data array from given source variable.

This method copies the data array from given variable to this variable. The copy is done even if the arrays reside on different devices, including across the host and a GPU device. If this variable has an uninitialized data array, this method initializes it by the data array of the given variable. Similarly, if the given variable has an uninitialized data array, this method initializes it by the data array of this variable (self). If both are uninitialized, this method does nothing.

Parameters

var (Variable) – Source variable.

debug_print()[source]

Display a summary of the stored data and location of the Variable

from_chx()[source]

Converts the array and gradient to non-ChainerX arrays without copy.

This method converts the underlying ChainerX array and gradient residing in either a native or cuda device to NumPy or CuPy arrays respectively, on their same physical device. It does nothing if the array held by the Variable object is not a ChainerX array. The new array is a view of the original one.

Raises an error if such a conversion is not supported for the device.

initialize(shape)[source]

Initializes the uninitialized variable.

Uninitialized variable is a variable created with the data array set to None. This method creates and initializes the data array. The shape of the variable can be left unknown until this method is called.

Parameters

shape (tuple of int) – Shape of the data array.

item()[source]

Converts the variable with one element to a Python scalar.

This will incur host-device synchronization.

Returns

The element of the array.

Return type

int or float

mean(axis=None, *, weights=None, keepdims=False)[source]

Calculate weighted average of array elements over a given axis.

See also

chainer.functions.average() for full documentation,

reshape(*shape)[source]

Returns a variable of a different shape and the same content.

See also

chainer.functions.reshape() for full documentation,

retain_data()[source]

Lets the corresponding variable node keep the underlying array.

set_creator(gen_func)[source]

Notifies the variable that the given function is its creator.

Parameters

gen_func (Function) – Function object that creates this variable as one of its outputs.

set_creator_node(fnode)[source]

Notifies the variable that the given node is its creator.

Parameters

fnode (FunctionNode) – Function node that has this variable as an output.

summary()[source]
to_chx()[source]

Converts the array and gradient to ChainerX arrays without copy.

This method converts the underlying array and gradient to chainerx.ndarray on the same physical device. It does nothing if the array held by the Variable object is already a ChainerX array. The new array is a view of the original one.

to_cpu()[source]

Copies the data and gradient arrays to CPU.

to_device(device)[source]

Copies the data and gradient arrays to specified device.

Parameters

device – Target device specifier. See get_device() for available values.

to_gpu(device=None)[source]

Copies the data and gradient arrays to specified GPU.

Parameters

device – Target device specifier. If omitted, the current device is used.

to_intel64()[source]

Copies the data and gradient arrays to intel64 specific mdarray.

If the array is not suited for intel64, it will be converted to numpy.ndarray.

transpose(*axes)[source]

Permute the dimensions of an input variable without copy.

See also

chainer.functions.transpose() for full documentation.

unchain()[source]

Deletes the reference to the creator of this variable.

This method deletes the reference to the creator from the corresponding variable node. Unlike unchain_backward(), it does not backtrack the graph.

This method is equivalent to self.creator_node = None.

unchain_backward()[source]

Deletes references between variable nodes and functions backward.

After this method completes, intermediate variable nodes and functions that are not referenced from anywhere are deallocated by reference count GC. Also this variable itself deletes the reference to its creator function from the node, i.e. the node becomes root in the computation graph. It indicates that backprop after unchaining stops at this variable. This behavior is useful to implement truncated BPTT.

update()[source]

Updates the data array using the gradient and the update rule.

This method updates the parameter using the attached update rule.

zerograd()[source]

Initializes the gradient array by zeros.

Note that the gradient variable is unchained from the computational graph by this method, because this operation breaks the backprop validity.

Deprecated since version v1.15: Use more efficient cleargrads() instead.

__eq__(other)[source]

This operator is not supported in Variables.

__ne__(other)[source]

This operator is not supported in Variables.

__lt__(other)[source]

This operator is not supported in Variables.

__le__(other)[source]

This operator is not supported in Variables.

__gt__(other)[source]

This operator is not supported in Variables.

__ge__(other)[source]

This operator is not supported in Variables.

__nonzero__()[source]

This operator is not supported in Variables.

__bool__()[source]

This operator is not supported in Variables.

__neg__()[source]

Element-wise negation.

Returns

Output variable.

Return type

Variable

__abs__()[source]

Element-wise absolute.

Returns

Output variable.

Return type

Variable

__add__()[source]

Element-wise addition.

Returns

Output variable.

Return type

Variable

__radd__()[source]

Element-wise addition.

Returns

Output variable.

Return type

Variable

__sub__(rhs)[source]

Element-wise subtraction.

Returns

Output variable.

Return type

Variable

__rsub__(rhs)[source]

Element-wise subtraction.

Returns

Output variable.

Return type

Variable

__mul__(rhs)[source]

Element-wise multiplication.

Returns

Output variable.

Return type

Variable

__rmul__(rhs)[source]

Element-wise multiplication.

Returns

Output variable.

Return type

Variable

__div__(rhs)[source]

Element-wise division

Returns

Output variable.

Return type

Variable

__truediv__(rhs)[source]

Element-wise division

Returns

Output variable.

Return type

Variable

__rdiv__(rhs)[source]

Element-wise division.

Returns

Output variable.

Return type

Variable

__rtruediv__(rhs)[source]

Element-wise division.

Returns

Output variable.

Return type

Variable

__floordiv__(rhs)[source]

Element-wise floor division.

Returns

Output variable.

Return type

Variable

__rfloordiv__(rhs)[source]

Element-wise floor division.

Returns

Output variable.

Return type

Variable

__pow__(rhs)[source]

Element-wise power function.

Returns

Output variable.

Return type

Variable

__rpow__(rhs)[source]

Element-wise power function.

Returns

Output variable.

Return type

Variable

__matmul__(rhs)[source]

Matrix multiplication.

Returns

Output variable.

Return type

Variable

__rmatmul__(rhs)[source]

Matrix multiplication.

Returns

Output variable.

Return type

Variable

Attributes

T

Transposition of this variable.

array

The underlying data array.

It is either numpy.ndarray or cupy.ndarray object, or None if the variable in in an uninitialized state.

chx_array

A view of the raw ChainerX array.

In contrary to Variable.array which is always disconnected, the array represented by this attribute may be connected to the computational graph.

It is a view, so it has a distinct gradient from the original array.

If this attribute is queried on a Variable with a non-ChainerX array, ValueError will be raised.

creator

Function implementation that created this variable.

When this variable has been created by an old-style function (i.e., it is implemented as a subclass of Function), this property returns that Function object.

When this variable has been created by a new-style function (i.e., it is implemented as a subclass of FunctionNode class), this property returns that node object.

creator_node

FunctionNode object that created this variable.

This property has a setter to which None can be set. Setting None to this property is equivalent to call unchain(); it purges the variable from the function that created this variable.

The setter also accepts the original FunctionNode object that created this variable. For example, you can once set None to this property and then set the original value again.

Note

Setting an irrelevant FunctionNode() object does not emit any error immediately, whereas the behavior is undefined. Do not set a FunctionNode() object that did not create this variable object.

data

The underlying data array (equivalent to array).

Note that using this attribute directly is discouraged; use array instead. Using array, you can find an error earlier when your code mixes up Variable and ndarray because ndarray does not have an attribute .array while it has .data.

device

Device on which the data array of this variable reside.

dtype
grad

Gradient array of this variable.

Note that this property returns the underlying array of the gradient variable instead of the gradient variable itself; to get/set gradient variable, use grad_var instead.

If the underlying array is a chainerx.ndarray and requires_grad is false, trying to access the gradient will results in and error.

grad_var

Gradient variable.

initializer = None
label

Short text that represents the variable.

name
ndim
node
rank
requires_grad

It indicates that grad will be set in backward calculation.

shape
size
xp

Array module for the data array of this variable.