Comparison with Other Frameworks

A table for quick comparison

This table compares Chainer with other actively developed deep learning frameworks. Content is current as of July 2017.

    Chainer PyTorch TensorFlow Theano-based Caffe1/Caffe2 Torch7 MXNet DyNet PaddlePaddle DL4J CNTK neon Knet.jl Darknet Thinc
Basics Language Python Python Python Python Python/C++/ MATLAB LuaJIT Python/others Python/C++ Python/C++ Java BrainScript/ Python/C++ Python Julia C Python
  Approach define-by-run define-by-run symbolic autograd symbolic autograd static static/ manual grads symbolic autograd/ manual grads/ define-by-run [1] define-by-run symbolic autograd static/ manual grads/ symbolic autograd [2] static/ symbolic autograd static/ symbolic autograd [3] define-by-run static callback-based define-by-run
  CPU backend package NumPy TH Eigen NumPy   TH mshadow Eigen   ND4J   NumPy Julia   NumPy
  GPU backend package CuPy THC Eigen libgpuarray   THC mshadow Eigen   ND4J   neon KnetArrays   CuPy
  Primary sponsor Preferred Networks Facebook Google MILA Facebook Facebook Amazon/Apache CMU Baidu Skymind Microsoft Intel Nervana Koç University Joe Redmon Explosion AI
NNs CNNs full full full full full full full partial full full full full partial full none
  RNNs full full full full partial full full full full full full partial partial partial partial
  Reverse-mode autograd Y Y Y Y   torch-autograd Y Y Y   Y ngraph Y   with closures
  Forward-mode autograd     tensorflow-forward-ad Y                      
  Higher-order grads Y [4] Y Y Y                 Y    
  Variable-length loops native native while_loop scan RNNs only native 2017 native RNNs only none dynamic axis none native none native
  Different architectures per batch native native fold     torch-autograd MinPy native         native   native
Performance cuDNN support full full partial partial full full full partial full partial full N/A [5]   partial  
  CPU/GPU generic backend Y Y       Y Y Y Y Y Y Y Y   Y
  Multi-GPU data parallelism Y Y Y Y Y Y Y   Y Y Y Y Y Y  
  Multi-GPU model parallelism Y Y Y Y Y Y Y   Y   Y Y      
  Multiprocessing [6] full partial           full              
  Distributed training ChainerMN THD Y   2017 torch-distlearn Y   Y Spark Y Y      
Misc Runtime debugging debug mode, typechecking, pdb pdb tfdbg       Monitor pdb   Java debuggers cntk.debugging   Gallium.jl gdb pdb
  Trainer abstraction native tnt   Blocks, Lasagne, Keras native torchnet     native native native native     native
  Reporter abstraction native tnt native     torchnet native     native native        
  Web interface ChainerUI, tensorboardX tensorboardX, visdom TensorBoard             DL4J-UI   Nervana Cloud      
  Graph compilation engine   2017 XLA   2017   NNVM         ngraph      
[1]Define-by-run is in development as of June 2017 and tracked in dmlc/mxnet#5705. It is also possible using the much slower MinPy extension.
[2]Symbolic autograd is in development as of June 2017 and tracked in deeplearning4j/nd4j#1750.
[3]Symbolic autograd is available only with ngraph backend (experimental).
[4]Some functions do not support higher-order differentiation. See chainer/chainer#4449.
[5]Nervana provides kernels that are meant to compete with cuDNN.
[6]Multiprocessing provides a significant performance improvement only for frameworks that use Python at runtime.

Benchmarks

Benchmarks for convolutional networks can be found at convnet-benchmarks while some NLP benchmarks are at dynet-benchmark. Chainer wraps the latest available cuDNN kernels for CNNs and RNNs, so performance of most common networks that use these kernels is typically similar to that of other modern frameworks. As Chainer’s define-by-run approach means the user’s Python code is executed directly at runtime, particularly complex networks or those with very small tensor sizes may be slower than in static-graph frameworks.