CUDA utilities

Device, context and memory management on CuPy.

Chainer uses CuPy (with very thin wrapper) to exploit the speed of GPU computation. Following modules and classes defined in CuPy are imported to chainer.cuda module for convenience (refer to this table when reading chainer’s source codes).

imported name original name
chainer.cuda.cupy cupy
chainer.cuda.ndarray cupy.ndarray
chainer.cuda.cupy.cuda cupy.cuda
chainer.cuda.Device cupy.cuda.Device
chainer.cuda.Event cupy.cuda.Event
chainer.cuda.Stream cupy.cuda.Stream

Chainer replaces the default allocator of CuPy by its memory pool implementation. It enables us to reuse the device memory over multiple forward/backward computations, and temporary arrays for consecutive elementwise operations.

Devices

chainer.cuda.get_device Gets the device from a device object, an ID integer or an array object.
chainer.cuda.get_device_from_id Gets the device from an ID integer.
chainer.cuda.get_device_from_array Gets the device from a list of CuPy array or a single CuPy array.

CuPy array allocation and copy

chainer.cuda.copy Copies a cupy.ndarray object using the default stream.
chainer.cuda.to_cpu Copies the given GPU array to host CPU.
chainer.cuda.to_gpu Copies the given CPU array to the specified device.

Kernel definition utilities

chainer.cuda.memoize Makes a function memoizing the result for each argument and device.
chainer.cuda.clear_memo Clears the memoized results for all functions decorated by memoize.
chainer.cuda.elementwise Creates an elementwise kernel function.
chainer.cuda.reduce Creates a global reduction kernel function.

CPU/GPU generic code support

chainer.cuda.get_array_module Gets an appropriate one from numpy or cupy.

cuDNN support

chainer.cuda.set_max_workspace_size Sets the workspace size for cuDNN.
chainer.cuda.get_max_workspace_size Gets the workspace size for cuDNN.