Utilities

CUDA utilities

Device, context and memory management on PyCUDA and scikits.cuda.

Chainer uses PyCUDA facilities (with very thin wrapper) to exploit the speed of GPU computation. Following modules and classes are imported to cuda module for convenience (refer to this table when reading chainer’s source codes).

imported name original name
chainer.cuda.cublas scikits.cuda.cublas
chainer.cuda.cumath pycuda.cumath
chainer.cuda.curandom pycuda.curandom
chainer.cuda.culinalg scikits.cuda.linalg
chainer.cuda.cumisc scikits.cuda.misc
chainer.cuda.gpuarray pycuda.gpuarray
chainer.cuda.Context pycuda.driver.Context
chainer.cuda.Device pycuda.driver.Device
chainer.cuda.Event pycuda.driver.Event
chainer.cuda.GPUArray pycuda.gpuarray.GPUArray
chainer.cuda.Stream pycuda.driver.Stream

Chainer provides thin wrappers of GPUArray allocation routines, which use mem_alloc() as the allocator. This allocator uses device-wise instance of DeviceMemoryPool, which enables the reuse of device memory over multiple forward/backward computations. mem_alloc() also inserts an additional attribute to the allocated memory called device, which indicates the device that the memory is allocated on. Functions of cuda uses this attribute to select appropriate device on each manipulation routine.

Initialization and global states

chainer.cuda.init(device=None)[source]

Initializes CUDA global state.

Chainer maintains CUDA context, CUBLAS context, random number generator and device memory pool for each GPU device and for each process (the main process or a process forked by multiprocessing) as global states. When called for the first time on the process, this function initializes these global states.

Warning

This function also initializes PyCUDA and scikits.cuda. Since these packages do not support forking after initialization, do not call this function before forking the process.

This function also registers shutdown() to atexit slot.

It also initializes random number generator. User can set fixed seed with CHAINER_SEED environment variable.

Parameters:device (int or Device or None) – Device ID to initialize on.
chainer.cuda.shutdown()[source]

Finalizes CUDA global state.

This function is automatically called by atexit. Multiple calls are allowed, so user can manually call this function if necessary.

chainer.cuda.mem_alloc(nbytes)[source]

Allocates device memory of given size from memory pool.

This function chooses memory pool corresponding to the current device.

Parameters:nbytes (int) – The size of memory in bytes.
Returns:Allocated memory with additional device attribute. This attribute is used to determine on which GPU the memory resides.
Return type:pycuda.tools.PooledDeviceAllocation

Devices and contexts

chainer.cuda.get_device(arg=None)[source]

Gets the device from ID ‘’arg’’ or given chainer’s GPUArray.

Parameters:arg – Value to specify a GPU device.
Returns:Device object specified by given arg.

The rule of device selection is following.

Type of arg Return value
None Current device
int Device of ID arg
Device arg
GPUArray Device given array was allocated on
ndarray None
chainer.cuda.use_device(arg, pop=True)[source]

Switches the CUDA context to use given device.

Parameters:
  • arg – Argument of get_device().
  • pop (bool) – If True, pop the current context from context stack.
chainer.cuda.using_device(*args)[source]

Returns DeviceUser object of the first GPUArray argument.

If none of the arguments specifies a GPU device, then it returns a dummy DeviceUser object which is inactive.

Parameters:*args – Objects based on which an appropriate device should be selected.
Returns:Device user instance of selected argument.
Return type:DeviceUser

Example

Suppose arrays is a list of arrays of type either ndarray or GPUArray. Then, the following code invokes do_something_on with an appropriate context:

with using_device(*arrays):
    do_something_on(arrays)
class chainer.cuda.DeviceUser(arg)[source]

RAII-style CUDA context swithcer.

Parameters:arg – Argument of get_device().
device

~pycuda.driver.Device

Selected device.

chainer.cuda.get_context(arg=None)[source]

Gets the context corresponding to the specified device.

Parameters:arg – Argument of get_device().
Returns:Context object corresponding to the specified device.
Return type:Context
chainer.cuda.get_cublas_handle()[source]

Gets CUBLAS handle for the current device.

Returns:CUBLAS handle.
chainer.cuda.using_cumisc(handle=None)[source]

Temporarily use chainer’s CUBLAS handle on scikits.cuda.misc.

The usage is similar to using_device().

Parameters:handle – CUBLAS handle. If None is specified, it uses CUBLAS handle for the current device.
Returns:Misc user object.
Return type:CumiscUser
class chainer.cuda.CumiscUser(handle)[source]

RAII-style switcher of scikits.cuda.misc default CUBLAS handle.

GPUArray allocation and copy

chainer.cuda.copy(array, out=None, out_device=None)[source]

Copies GPUArray using default stream.

This function can copy the device array to the destination array on another device.

Parameters:
  • array (GPUArray) – Array to be copied.
  • out (GPUArray) – Destination array. If it is not None, then out_device argument is ignored.
  • out_device – Destination device specifier. Actual device object is obtained by passing this value to get_device().
Returns:

Copied array.

If out is not specified, then the array is allocated on the device specified by out_device argument.

Return type:

GPUArray

chainer.cuda.copy_async(array, out=None, out_device=None, stream=None)[source]

Copies GPUArray using given stream.

This function can copy the device array to the destination array on another device.

Parameters:
  • array (GPUArray) – Array to be copied.
  • out (GPUArray) – Destination array. If it is not None, then out_device argument is ignored.
  • out_device – Destination device specifier. Actual device object is obtained by passing this value to get_device().
  • stream (Stream) – CUDA stream.
Returns:

Copied array.

If out is not specified, then the array is allocated on the device specified by out_device argument.

Return type:

GPUArray

Warning

Currently, copy_async over different devices raises exception, since PyCUDA drops the definition of pycuda.driver.memcopy_peer_async().

chainer.cuda.empty(shape, dtype=<type 'numpy.float32'>)[source]

Creates an uninitialized GPUArray.

Parameters:
  • shape (tuple of ints) – The shape of array.
  • dtype (numpy.dtype) – Element type.
Returns:

Uninitialized GPU array allocated by memory pool.

Return type:

GPUArray

chainer.cuda.empty_like(array)[source]

Alias to pycuda.gpuarray.empty_like().

chainer.cuda.full(shape, fill_value, dtype=<type 'numpy.float32'>, stream=None)[source]

Creates a constant-filled GPUArray.

Parameters:
  • shape (tuple of ints) – The shape of array.
  • fill_value – Constant to fill the array by.
  • dtype (numpy.dtype) – Element type.
  • stream (Stream) – CUDA stream.
Returns:

Constant-filled GPU array allocated by memory pool.

Return type:

GPUArray

chainer.cuda.full_like(array, fill_value, stream=None)[source]

Creates a constant-filled GPUArray like given array.

Parameters:
  • array (GPUArray) – Base array.
  • fill_value – Constant value to fill the array by.
  • stream (Stream) – CUDA stream.
Returns:

Constant-filled array.

Return type:

GPUArray

chainer.cuda.zeros(shape, dtype=<type 'numpy.float32'>, stream=None)[source]

Creates a zero-filled GPUArray.

This function is equivalent to full(shape, 0, dtype, stream).

chainer.cuda.zeros_like(array, stream=None)[source]

Creates a zero-filled GPUArray like given array.

This function is equivalent to full_like(array, 0, stream).

chainer.cuda.ones(shape, dtype=<type 'numpy.float32'>, stream=None)[source]

Creates a zero-filled GPUArray.

This function is equivalent to full(shape, 1, dtype, stream).

chainer.cuda.ones_like(array, stream=None)[source]

Creates a one-filled GPUArray like given array.

This function is equivalent to full_like(array, 1, stream).

chainer.cuda.to_cpu(array)[source]

Copies the given GPU array to host CPU.

Parameters:array – Array to be sent to GPU.
Returns:Array on CPU.

If given array is already on CPU, then this function just returns array without performing any copy.

Return type:ndarray
chainer.cuda.to_cpu_async(array, stream=None)[source]

Copies the given GPU array asynchronously to host CPU.

Parameters:
  • array – Array to be sent to GPU.
  • stream (Stream) – CUDA stream.
Returns:

Array on CPU.

If given array is already on CPU, then this function just returns array without performing any copy.

Return type:

ndarray

chainer.cuda.to_gpu(array, device=None)[source]

Copies the given CPU array to specified device.

Parameters:
  • array – Array to be sent to GPU.
  • device – Device specifier.
Returns:

Array on GPU.

If array is already on GPU, then this function just returns array without performing any copy. Note that this function does not copy GPUArray into specified device.

Return type:

GPUArray

chainer.cuda.to_gpu_async(array, stream=None)[source]

Copies the given CPU array asynchronously to the current device.

Parameters:
  • array – Array to be sent to GPU. If it is ndarray, then its memory must be pagelocked.
  • stream (Stream) – CUDA stream.
Returns:

Array on GPU.

If given array is already on GPU, then this function just returns array without performing any copy.

Return type:

GPUArray

Random number generators

chainer.cuda.get_generator(device=None)[source]

Gets the random number generator for the given device.

Parameters:device – Device specifier (an arugment of get_device())
Returns:Random number generator.
Return type:pycuda.curandom.XORWOWRandomNumberGenerator
chainer.cuda.seed(s=None, device=None)[source]

Resets the random number generator of the specified device by the given seed.

Parameters:
  • s (int or None) – Seed value. If it is None, it initializes the generator without fixed seed.
  • device – Device specifier (i.e. argument of get_device()).

Kernel definition utilities

chainer.cuda.elementwise(arguments, operation, name, keep=False, options=None, preamble='', loop_prep='', after_loop='')[source]

Creates an elementwise kernel function.

This function uses pycuda.tools.context_dependent_memoize() to cache the resulting kernel object, i.e. the resulting kernel object is cached for each arguments and CUDA context.

The arguments are the same as those for pycuda.elementwise.ElementwiseKernel(), except that name argument is mandatory.

chainer.cuda.reduce(arguments, map_expr, reduce_expr, neutral, name, dtype_out=<type 'numpy.float32'>, keep=False, options=None, preamble='')[source]

Creates a global reduction kernel function.

This function uses pycuda.tools.context_dependent_memoize() to cache the resulting kernel object, i.e. the resulting kernel object is cached for each argument and CUDA context.

The arguments are the same as those for pycuda.reduction.ReductionKernel(), except that their order is different and name argument is mandatory.

Interprocess communication on GPU

class chainer.cuda.IPCEvent[source]

Event object for interprocess synchronization on GPU.

class chainer.cuda.IPCArrayHandle(array)[source]

Converter between GPUArray and its Inter-Process Communication handle.

It holds IPC memory handle with shape and dtype information. The instance can be pickled, which means it can be passed through IPC path way, e.g. Pipe and Queue. The other process can extract shared GPUArray by calling get(). Also, the extracted array can be re-converted into another IPCArrayHandle.

Gradient checking utilities

chainer.gradient_check.assert_allclose(x, y, atol=1e-05, rtol=0.0001, verbose=True)[source]

Asserts if some corresponding element of x and y differs too much.

This function can handle both CPU and GPU arrays simultaneously.

Parameters:
  • x – Left-hand-side array.
  • y – Right-hand-side array.
  • atol (float) – Absolute tolerance.
  • rtol (float) – Relative tolerance.
  • verbose (bool) – If True, it outputs verbose messages on error.
chainer.gradient_check.numerical_grad(f, inputs, grad_outputs, eps=0.001)[source]

Computes numerical gradient by finite differences.

This function is used to implement gradient check. For usage example, see unit tests of chainer.functions.

Parameters:
  • f (function) – Python function with no arguments that runs forward computation and returns the result.
  • inputs (tuple of arrays) – Tuple of arrays that should be treated as inputs. Each element of them is slightly modified to realize numerical gradient by finite differences.
  • grad_outputs (tuple of arrays) – Tuple of arrays that are treated as output gradients.
  • eps (float) – Epsilon value of finite differences.
Returns:

Numerical gradient arrays corresponding to inputs.

Return type:

tuple