Iterator examples¶

Chainer provides some iterators that implement typical strategies to create mini-batches by iterating over datasets. SerialIterator is the simplest one, which extract mini batches in the main thread. MultiprocessIterator is a parallelized version of SerialIterator. It maintains worker subprocesses to load the next mini-batch in parallel.

SerialIterator¶

class chainer.iterators.SerialIterator(dataset, batch_size, repeat=True, shuffle=True)[source]¶

Dataset iterator that serially reads the examples.

This is a simple implementation of Iterator that just visits each example in either the order of indexes or a shuffled order.

To avoid unintentional performance degradation, the shuffle option is set to True by default. For validation, it is better to set it to False when the underlying dataset supports fast slicing. If the order of examples has an important meaning and the updater depends on the original order, this option should be set to False.

This iterator saves -1 instead of None in snapshots since some serializers do not support None.

Parameters:	dataset – Dataset to iterate. batch_size (int) – Number of examples within each batch. repeat (bool) – If `True`, it infinitely loops over the dataset. Otherwise, it stops iteration at the end of the first epoch. shuffle (bool) – If `True`, the order of examples is shuffled at the beginning of each epoch. Otherwise, examples are extracted in the order of indexes.

MultiprocessIterator¶

class chainer.iterators.MultiprocessIterator(dataset, batch_size, repeat=True, shuffle=True, n_processes=None, n_prefetch=1, shared_mem=None)[source]¶

Dataset iterator that loads examples in parallel.

This is an implementation of Iterator that loads examples with worker processes. It uses the standard multiprocessing module to parallelize the loading. The dataset is sent to the worker processes in the standard way using pickle.

Note that this iterator effectively prefetches the examples for the next batch asynchronously after the current batch is returned.

This iterator saves -1 instead of None in snapshots since some serializers do not support None.

Parameters:

dataset (Dataset) – Dataset to iterate.
batch_size (int) – Number of examples within each batch.
repeat (bool) – If True, it infinitely loops over the dataset. Otherwise, it stops iteration at the end of the first epoch.
shuffle (bool) – If True, the order of examples is shuffled at the beginning of each epoch. Otherwise, examples are extracted in the order of indexes.
n_processes (int) – Number of worker processes. The number of CPUs is used by default.
n_prefetch (int) – Number of prefetch batches.
shared_mem (int) – The size of using shared memory per data. If None, size is adjusted automatically.