Distributed Deep Learning with ChainerMN¶
ChainerMN enables multi-node distributed deep learning with the following features:
- Scalable — it makes full use of the latest technologies such as NVIDIA NCCL and CUDA-Aware MPI,
- Flexible — even dynamic neural networks can be trained in parallel thanks to Chainer’s flexibility, and
- Easy — minimal changes to existing user code are required.
This blog post provides our benchmark results using up to 128 GPUs.
ChainerMN can be used for both inner-node (i.e., multiple GPUs inside a node) and inter-node settings. For inter-node settings, we highly recommend to use high-speed interconnects such as InfiniBand.
- Model Parallel
- API Reference