There are some non-obvious limitations in ChainerX:
ChainerX only supports a limited set of dtypes:
Operations with mixed dtypes are not supported. You need to explicitly convert dtypes using either
True division of Python, where 2/3 returns .66 rather than 0, is not supported yet. Given an ndarray
aof the dtype
a / adoes not return an array of
float64, but returns an array of
Only a limited set of Chainer
functions are well tested with the ChainerX integration.
ChainerX CUDA backend requires cuDNN. See installation for details.
arrays have a computational graph in their own, some operations are prohibited for safety:
Unless an array is free from the computational graph, in-place modification of its data is prohibited.
a = chainerx.zeros((2,), chainerx.float32) a.require_grad() # install the computational graph on `a`. a += 1 # ! error
The reason of this limitation is that, as backward operations may depend on the value of
a, the backward gradients might be unexpectedly affected if it would be altered.
You may circumvent this limitation by making a disconnected view:
# A memory-shared view of `a` which is disconnected from the computational graph of `a`. b = a.as_grad_stopped() b += 1
Note however that this operation is inherently dangerous. You should be super careful to ensure that that does not affect backward computations.
Note also that we may restrict further in the future so that even in-place modification on a disconnected view is only allowed if it is actually safe.
If an array is wrapped with a
requires_grad=True(which is default), you won’t be able to re-assign the array:
a = chainerx.zeros((2,), chainerx.float32) b = chainerx.zeros((2,), chainerx.float32) var = chainer.Variable(a) var.array = b # ! error
You may circumvent this by using in-place assignment on
var.array[:] = b
This workaround may also be dangerous just as in the previous limitation.