The most basic
dataset implementation is an array.
Both NumPy and CuPy arrays can be used directly as datasets.
In many cases, though, the simple arrays are not enough to write the training procedure. In order to cover most of such cases, Chainer provides many built-in implementations of datasets.
These built-in datasets are divided into two groups.
One is a group of general datasets.
Most of them are wrapper of other datasets to introduce some structures (e.g., tuple or dict) to each data point.
The other one is a group of concrete, popular datasets.
These concrete examples use the downloading utilities in the
chainer.dataset module to cache downloaded and converted datasets.
General datasets are further divided into three types.
The second one is
SubDataset, which represents a subset of an existing dataset. It can be used to separate a dataset for hold-out validation or cross validation. Convenient functions to make random splits are also provided.
The third one is
TransformDataset, which wraps around a dataset by applying a function to data indexed from the underlying dataset.
It can be used to modify behavior of a dataset that is already prepared.
||Subset of a base dataset.|
||Splits a dataset into two subsets.|
||Splits a dataset into two subsets randomly.|
||Creates a set of training/test splits for cross validation.|
||Creates a set of training/test splits for cross validation randomly.|
||Dataset that indexes the base dataset and transforms the data.|
||Dataset of images built from a list of paths to image files.|
||Gets the MNIST dataset.|
||Gets the CIFAR-10 dataset.|
||Gets the CIFAR-100 dataset.|
||Gets the Penn Tree Bank dataset as long word sequences.|
||Gets the Penn Tree Bank word vocabulary.|