chainer.dataset.TabularDataset¶
- class chainer.dataset.TabularDataset[source]¶
An abstract class that represents tabular dataset.
This class represents a tabular dataset. In a tabular dataset, all examples have the same number of elements. For example, all examples of the dataset below have three elements (
a[i],b[i], andc[i]).a
b
c
0
a[0]b[0]c[0]1
a[1]b[1]c[1]2
a[2]b[2]c[2]3
a[3]b[3]c[3]Since an example can be represented by both tuple and dict (
(a[i], b[i], c[i])and{'a': a[i], 'b': b[i], 'c': c[i]}), this class usesmodeto indicate which representation will be used. If there is only one column, an example also can be represented by a value (a[i]). In this case,modeisNone.An inheritance should implement
__len__(),keys,modeandget_examples().>>> import numpy as np >>> >>> from chainer import dataset >>> >>> class MyDataset(dataset.TabularDataset): ... ... def __len__(self): ... return 4 ... ... @property ... def keys(self): ... return ('a', 'b', 'c') ... ... @property ... def mode(self): ... return tuple ... ... def get_examples(self, indices, key_indices): ... data = np.arange(12).reshape((4, 3)) ... if indices is not None: ... data = data[indices] ... if key_indices is not None: ... data = data[:, list(key_indices)] ... return tuple(data.transpose()) ... >>> dataset = MyDataset() >>> len(dataset) 4 >>> dataset.keys ('a', 'b', 'c') >>> dataset.astuple()[0] (0, 1, 2) >>> sorted(dataset.asdict()[0].items()) [('a', 0), ('b', 1), ('c', 2)] >>> >>> view = dataset.slice[[3, 2], ('c', 0)] >>> len(view) 2 >>> view.keys ('c', 'a') >>> view.astuple()[1] (8, 6) >>> sorted(view.asdict()[1].items()) [('a', 6), ('c', 8)]
Methods
- __getitem__(index)[source]¶
Returns an example or a sequence of examples.
It implements the standard Python indexing and one-dimensional integer array indexing. It uses the
get_example()method by default, but it may be overridden by the implementation to, for example, improve the slicing performance.- Parameters
index (int, slice, list or numpy.ndarray) – An index of an example or indexes of examples.
- Returns
If index is int, returns an example created by get_example. If index is either slice or one-dimensional list or numpy.ndarray, returns a list of examples created by get_example.
Example
>>> import numpy >>> from chainer import dataset >>> class SimpleDataset(dataset.DatasetMixin): ... def __init__(self, values): ... self.values = values ... def __len__(self): ... return len(self.values) ... def get_example(self, i): ... return self.values[i] ... >>> ds = SimpleDataset([0, 1, 2, 3, 4, 5]) >>> ds[1] # Access by int 1 >>> ds[1:3] # Access by slice [1, 2] >>> ds[[4, 0]] # Access by one-dimensional integer list [4, 0] >>> index = numpy.arange(3) >>> ds[index] # Access by one-dimensional integer numpy.ndarray [0, 1, 2]
- concat(*datasets)[source]¶
Stack datasets along rows.
- Parameters
datasets (iterable of
TabularDataset) – Datasets to be concatenated. All datasets must have the samekeys.- Returns
A concatenated dataset.
- convert(data)[source]¶
Convert fetched data.
This method takes data fetched by
fetch()and pre-process them before passing them to models. The default behaviour is converting each column into an ndarray. This behaviour can be overridden bywith_converter(). If the dataset is constructed byconcat()orjoin(), the converter of the first dataset is used.
- fetch()[source]¶
Fetch data.
This method fetches all data of the dataset/view. Note that this method returns a column-major data (i.e.
([a[0], ..., a[3]], ..., [c[0], ... c[3]]),{'a': [a[0], ..., a[3]], ..., 'c': [c[0], ..., c[3]]}, or[a[0], ..., a[3]]).
- get_example(i)[source]¶
Returns the i-th example.
Implementations should override it. It should raise
IndexErrorif the index is invalid.- Parameters
i (int) – The index of the example.
- Returns
The i-th example.
- join(*datasets)[source]¶
Stack datasets along columns.
- Parameters
datasets (iterable of
TabularDataset) – Datasets to be concatenated. All datasets must have the same length- Returns
A joined dataset.
- transform(keys, transform)[source]¶
Apply a transform to each example.
- Parameters
keys (tuple of strs) – The keys of transformed examples.
transform (callable) – A callable that takes an example and returns transformed example.
modeof transformed dataset is determined by the transformed examples.
- Returns
A transfromed dataset.
- transform_batch(keys, transform_batch)[source]¶
Apply a transform to examples.
- Parameters
keys (tuple of strs) – The keys of transformed examples.
transform_batch (callable) – A callable that takes examples and returns transformed examples.
modeof transformed dataset is determined by the transformed examples.
- Returns
A transfromed dataset.
- with_converter(converter)[source]¶
Override the behaviour of
convert().This method overrides
convert().- Parameters
converter (callable) – A new converter.
- Returns
A dataset with the new converter.
- __eq__(value, /)¶
Return self==value.
- __ne__(value, /)¶
Return self!=value.
- __lt__(value, /)¶
Return self<value.
- __le__(value, /)¶
Return self<=value.
- __gt__(value, /)¶
Return self>value.
- __ge__(value, /)¶
Return self>=value.
Attributes
- keys¶
Names of columns.
A tuple of strings that indicate the names of columns.