chainer.dataset.TabularDataset¶
- class chainer.dataset.TabularDataset[source]¶
An abstract class that represents tabular dataset.
This class represents a tabular dataset. In a tabular dataset, all examples have the same number of elements. For example, all examples of the dataset below have three elements (
a[i]
,b[i]
, andc[i]
).a
b
c
0
a[0]
b[0]
c[0]
1
a[1]
b[1]
c[1]
2
a[2]
b[2]
c[2]
3
a[3]
b[3]
c[3]
Since an example can be represented by both tuple and dict (
(a[i], b[i], c[i])
and{'a': a[i], 'b': b[i], 'c': c[i]}
), this class usesmode
to indicate which representation will be used. If there is only one column, an example also can be represented by a value (a[i]
). In this case,mode
isNone
.An inheritance should implement
__len__()
,keys
,mode
andget_examples()
.>>> import numpy as np >>> >>> from chainer import dataset >>> >>> class MyDataset(dataset.TabularDataset): ... ... def __len__(self): ... return 4 ... ... @property ... def keys(self): ... return ('a', 'b', 'c') ... ... @property ... def mode(self): ... return tuple ... ... def get_examples(self, indices, key_indices): ... data = np.arange(12).reshape((4, 3)) ... if indices is not None: ... data = data[indices] ... if key_indices is not None: ... data = data[:, list(key_indices)] ... return tuple(data.transpose()) ... >>> dataset = MyDataset() >>> len(dataset) 4 >>> dataset.keys ('a', 'b', 'c') >>> dataset.astuple()[0] (0, 1, 2) >>> sorted(dataset.asdict()[0].items()) [('a', 0), ('b', 1), ('c', 2)] >>> >>> view = dataset.slice[[3, 2], ('c', 0)] >>> len(view) 2 >>> view.keys ('c', 'a') >>> view.astuple()[1] (8, 6) >>> sorted(view.asdict()[1].items()) [('a', 6), ('c', 8)]
Methods
- __getitem__(index)[source]¶
Returns an example or a sequence of examples.
It implements the standard Python indexing and one-dimensional integer array indexing. It uses the
get_example()
method by default, but it may be overridden by the implementation to, for example, improve the slicing performance.- Parameters
index (int, slice, list or numpy.ndarray) – An index of an example or indexes of examples.
- Returns
If index is int, returns an example created by get_example. If index is either slice or one-dimensional list or numpy.ndarray, returns a list of examples created by get_example.
Example
>>> import numpy >>> from chainer import dataset >>> class SimpleDataset(dataset.DatasetMixin): ... def __init__(self, values): ... self.values = values ... def __len__(self): ... return len(self.values) ... def get_example(self, i): ... return self.values[i] ... >>> ds = SimpleDataset([0, 1, 2, 3, 4, 5]) >>> ds[1] # Access by int 1 >>> ds[1:3] # Access by slice [1, 2] >>> ds[[4, 0]] # Access by one-dimensional integer list [4, 0] >>> index = numpy.arange(3) >>> ds[index] # Access by one-dimensional integer numpy.ndarray [0, 1, 2]
- concat(*datasets)[source]¶
Stack datasets along rows.
- Parameters
datasets (iterable of
TabularDataset
) – Datasets to be concatenated. All datasets must have the samekeys
.- Returns
A concatenated dataset.
- convert(data)[source]¶
Convert fetched data.
This method takes data fetched by
fetch()
and pre-process them before passing them to models. The default behaviour is converting each column into an ndarray. This behaviour can be overridden bywith_converter()
. If the dataset is constructed byconcat()
orjoin()
, the converter of the first dataset is used.
- fetch()[source]¶
Fetch data.
This method fetches all data of the dataset/view. Note that this method returns a column-major data (i.e.
([a[0], ..., a[3]], ..., [c[0], ... c[3]])
,{'a': [a[0], ..., a[3]], ..., 'c': [c[0], ..., c[3]]}
, or[a[0], ..., a[3]]
).
- get_example(i)[source]¶
Returns the i-th example.
Implementations should override it. It should raise
IndexError
if the index is invalid.- Parameters
i (int) – The index of the example.
- Returns
The i-th example.
- join(*datasets)[source]¶
Stack datasets along columns.
- Parameters
datasets (iterable of
TabularDataset
) – Datasets to be concatenated. All datasets must have the same length- Returns
A joined dataset.
- transform(keys, transform)[source]¶
Apply a transform to each example.
- Parameters
keys (tuple of strs) – The keys of transformed examples.
transform (callable) – A callable that takes an example and returns transformed example.
mode
of transformed dataset is determined by the transformed examples.
- Returns
A transfromed dataset.
- transform_batch(keys, transform_batch)[source]¶
Apply a transform to examples.
- Parameters
keys (tuple of strs) – The keys of transformed examples.
transform_batch (callable) – A callable that takes examples and returns transformed examples.
mode
of transformed dataset is determined by the transformed examples.
- Returns
A transfromed dataset.
- with_converter(converter)[source]¶
Override the behaviour of
convert()
.This method overrides
convert()
.- Parameters
converter (callable) – A new converter.
- Returns
A dataset with the new converter.
- __eq__(value, /)¶
Return self==value.
- __ne__(value, /)¶
Return self!=value.
- __lt__(value, /)¶
Return self<value.
- __le__(value, /)¶
Return self<=value.
- __gt__(value, /)¶
Return self>value.
- __ge__(value, /)¶
Return self>=value.
Attributes
- keys¶
Names of columns.
A tuple of strings that indicate the names of columns.