chainer.dataset.TabularDataset

class chainer.dataset.TabularDataset[source]

An abstract class that represents tabular dataset.

This class represents a tabular dataset. In a tabular dataset, all examples have the same number of elements. For example, all examples of the dataset below have three elements (a[i], b[i], and c[i]).

a

b

c

0

a[0]

b[0]

c[0]

1

a[1]

b[1]

c[1]

2

a[2]

b[2]

c[2]

3

a[3]

b[3]

c[3]

Since an example can be represented by both tuple and dict ( (a[i], b[i], c[i]) and {'a': a[i], 'b': b[i], 'c': c[i]}), this class uses mode to indicate which representation will be used. If there is only one column, an example also can be represented by a value (a[i]). In this case, mode is None.

An inheritance should implement __len__(), keys, mode and get_examples().

>>> import numpy as np
>>>
>>> from chainer import dataset
>>>
>>> class MyDataset(dataset.TabularDataset):
...
...     def __len__(self):
...         return 4
...
...     @property
...     def keys(self):
...          return ('a', 'b', 'c')
...
...     @property
...     def mode(self):
...          return tuple
...
...     def get_examples(self, indices, key_indices):
...          data = np.arange(12).reshape((4, 3))
...          if indices is not None:
...              data = data[indices]
...          if key_indices is not None:
...              data = data[:, list(key_indices)]
...          return tuple(data.transpose())
...
>>> dataset = MyDataset()
>>> len(dataset)
4
>>> dataset.keys
('a', 'b', 'c')
>>> dataset.astuple()[0]
(0, 1, 2)
>>> sorted(dataset.asdict()[0].items())
[('a', 0), ('b', 1), ('c', 2)]
>>>
>>> view = dataset.slice[[3, 2], ('c', 0)]
>>> len(view)
2
>>> view.keys
('c', 'a')
>>> view.astuple()[1]
(8, 6)
>>> sorted(view.asdict()[1].items())
[('a', 6), ('c', 8)]

Methods

__getitem__(index)[source]

Returns an example or a sequence of examples.

It implements the standard Python indexing and one-dimensional integer array indexing. It uses the get_example() method by default, but it may be overridden by the implementation to, for example, improve the slicing performance.

Parameters

index (int, slice, list or numpy.ndarray) – An index of an example or indexes of examples.

Returns

If index is int, returns an example created by get_example. If index is either slice or one-dimensional list or numpy.ndarray, returns a list of examples created by get_example.

Example

>>> import numpy
>>> from chainer import dataset
>>> class SimpleDataset(dataset.DatasetMixin):
...     def __init__(self, values):
...         self.values = values
...     def __len__(self):
...         return len(self.values)
...     def get_example(self, i):
...         return self.values[i]
...
>>> ds = SimpleDataset([0, 1, 2, 3, 4, 5])
>>> ds[1]   # Access by int
1
>>> ds[1:3]  # Access by slice
[1, 2]
>>> ds[[4, 0]]  # Access by one-dimensional integer list
[4, 0]
>>> index = numpy.arange(3)
>>> ds[index]  # Access by one-dimensional integer numpy.ndarray
[0, 1, 2]
__len__()[source]

Returns the number of data points.

__iter__()[source]
asdict()[source]

Return a view with dict mode.

Returns

A view whose mode is dict.

astuple()[source]

Return a view with tuple mode.

Returns

A view whose mode is tuple.

concat(*datasets)[source]

Stack datasets along rows.

Parameters

datasets (iterable of TabularDataset) – Datasets to be concatenated. All datasets must have the same keys.

Returns

A concatenated dataset.

convert(data)[source]

Convert fetched data.

This method takes data fetched by fetch() and pre-process them before passing them to models. The default behaviour is converting each column into an ndarray. This behaviour can be overridden by with_converter(). If the dataset is constructed by concat() or join(), the converter of the first dataset is used.

Parameters

data (tuple or dict) – Data from fetch().

Returns

A tuple or dict. Each value is an ndarray.

fetch()[source]

Fetch data.

This method fetches all data of the dataset/view. Note that this method returns a column-major data (i.e. ([a[0], ..., a[3]], ..., [c[0], ... c[3]]), {'a': [a[0], ..., a[3]], ..., 'c': [c[0], ..., c[3]]}, or [a[0], ..., a[3]]).

Returns

If mode is tuple, this method returns a tuple of lists/arrays. If mode is dict, this method returns a dict of lists/arrays.

get_example(i)[source]

Returns the i-th example.

Implementations should override it. It should raise IndexError if the index is invalid.

Parameters

i (int) – The index of the example.

Returns

The i-th example.

get_examples(indices, key_indices)[source]

Return a part of data.

Parameters
  • indices (list of ints or slice) – Indices of requested rows. If this argument is None, it indicates all rows.

  • key_indices (tuple of ints) – Indices of requested columns. If this argument is None, it indicates all columns.

Returns

tuple of lists/arrays

join(*datasets)[source]

Stack datasets along columns.

Parameters

datasets (iterable of TabularDataset) – Datasets to be concatenated. All datasets must have the same length

Returns

A joined dataset.

transform(keys, transform)[source]

Apply a transform to each example.

Parameters
  • keys (tuple of strs) – The keys of transformed examples.

  • transform (callable) – A callable that takes an example and returns transformed example. mode of transformed dataset is determined by the transformed examples.

Returns

A transfromed dataset.

transform_batch(keys, transform_batch)[source]

Apply a transform to examples.

Parameters
  • keys (tuple of strs) – The keys of transformed examples.

  • transform_batch (callable) – A callable that takes examples and returns transformed examples. mode of transformed dataset is determined by the transformed examples.

Returns

A transfromed dataset.

with_converter(converter)[source]

Override the behaviour of convert().

This method overrides convert().

Parameters

converter (callable) – A new converter.

Returns

A dataset with the new converter.

__eq__(value, /)

Return self==value.

__ne__(value, /)

Return self!=value.

__lt__(value, /)

Return self<value.

__le__(value, /)

Return self<=value.

__gt__(value, /)

Return self>value.

__ge__(value, /)

Return self>=value.

Attributes

keys

Names of columns.

A tuple of strings that indicate the names of columns.

mode

Mode of representation.

This indicates the type of value returned by fetch() and __getitem__(). tuple, dict, and None are supported.

slice

Get a slice of dataset.

Parameters
  • indices (list/array of ints/bools or slice) – Requested rows.

  • keys (tuple of ints/strs or int or str) – Requested columns.

Returns

A view of specified range.