Environmental Variables

CHAINERMN_FORCE_ABORT_ON_EXCEPTIONS

If this variable is set to a non-empty value, ChainerMN installs a global hook to Python’s sys.excepthook to call MPI_Abort() when an unhandled exception occurs. See MPI process hangs after an unhandled Python exception.

ChainerMN issue #236 may also help to understand the problem.

Execution Control

chainermn.global_except_hook.add_hook()

Add a global hook function that captures all unhandled exceptions.

The function calls MPI_Abort() to force all processes abort. It is useful when you run your training script on a cloud platform.