chainer.links.BinaryHierarchicalSoftmax¶

class chainer.links.BinaryHierarchicalSoftmax(in_size, tree)[source]¶

Hierarchical softmax layer over binary tree.

In natural language applications, vocabulary size is too large to use softmax loss. Instead, the hierarchical softmax uses product of sigmoid functions. It costs only \(O(\log(n))\) time where \(n\) is the vocabulary size in average.

At first a user need to prepare a binary tree whose each leaf is corresponding to a word in a vocabulary. When a word \(x\) is given, exactly one path from the root of the tree to the leaf of the word exists. Let \(\mbox{path}(x) = ((e_1, b_1), \dots, (e_m, b_m))\) be the path of \(x\), where \(e_i\) is an index of \(i\)-th internal node, and \(b_i \in \{-1, 1\}\) indicates direction to move at \(i\)-th internal node (-1 is left, and 1 is right). Then, the probability of \(x\) is given as below:

\[\begin{split}P(x) &= \prod_{(e_i, b_i) \in \mbox{path}(x)}P(b_i | e_i) \\ &= \prod_{(e_i, b_i) \in \mbox{path}(x)}\sigma(b_i x^\top w_{e_i}),\end{split}\]

where \(\sigma(\cdot)\) is a sigmoid function, and \(w\) is a weight matrix.

This function costs \(O(\log(n))\) time as an average length of paths is \(O(\log(n))\), and \(O(n)\) memory as the number of internal nodes equals \(n - 1\).

Parameters:	in_size (int) – Dimension of input vectors. tree – A binary tree made with tuples like ((1, 2), 3).
Variables:	W (Variable) – Weight parameter matrix.

See: Hierarchical Probabilistic Neural Network Language Model [Morin+, AISTAT2005].

Methods

__call__(x, t)[source]¶

Computes the loss value for given input and ground truth labels.

Parameters:	x (Variable) – Input to the classifier at each node. t (Variable) – Batch of ground truth labels.
Returns:	Loss value.
Return type:	Variable

add_param(name, shape=None, dtype=<class 'numpy.float32'>, initializer=None)[source]¶

Registers a parameter to the link.

Deprecated since version v2.0.0: Assign a Parameter object directly to an attribute within init_scope() instead. For example, the following code

link.add_param('W', shape=(5, 3))

can be replaced by the following assignment.

with link.init_scope():
    link.W = chainer.Parameter(None, (5, 3))

The latter is easier for IDEs to keep track of the attribute’s type.

Parameters:

name (str) – Name of the parameter. This name is also used as the attribute name.
shape (int or tuple of ints) – Shape of the parameter array. If it is omitted, the parameter variable is left uninitialized.
dtype – Data type of the parameter array.
initializer – If it is not None, the data is initialized with the given initializer. If it is an array, the data is directly initialized by it. If it is callable, it is used as a weight initializer. Note that in these cases, dtype argument is ignored.

add_persistent(name, value)[source]¶

Registers a persistent value to the link.

The registered value is saved and loaded on serialization and deserialization. The value is set to an attribute of the link.

Parameters:	name (str) – Name of the persistent value. This name is also used for the attribute name. value – Value to be registered.

addgrads(link)[source]¶

Accumulates gradient values from given link.

This method adds each gradient array of the given link to corresponding gradient array of this link. The accumulation is even done across host and different devices.

Parameters:	link (Link) – Source link object.

children()[source]¶

Returns a generator of all child links.

Returns:	A generator object that generates all child links.

cleargrads()[source]¶

Clears all gradient arrays.

This method should be called before the backward computation at every iteration of the optimization.

copy()[source]¶

Copies the link hierarchy to new one.

The whole hierarchy rooted by this link is copied. The copy is basically shallow, except that the parameter variables are also shallowly copied. It means that the parameter variables of copied one are different from ones of original link, while they share the data and gradient arrays.

The name of the link is reset on the copy, since the copied instance does not belong to the original parent chain (even if exists).

Returns:	Copied link object.
Return type:	Link

copyparams(link)[source]¶

Copies all parameters from given link.

This method copies data arrays of all parameters in the hierarchy. The copy is even done across the host and devices. Note that this method does not copy the gradient arrays.

Parameters:	link (Link) – Source link object.

static create_huffman_tree(word_counts)[source]¶

Makes a Huffman tree from a dictionary containing word counts.

This method creates a binary Huffman tree, that is required for BinaryHierarchicalSoftmax. For example, {0: 8, 1: 5, 2: 6, 3: 4} is converted to ((3, 1), (2, 0)).

Parameters:	word_counts (dict of int key and int or float values) – Dictionary representing counts of words.
Returns:	Binary Huffman tree with tuples and keys of `word_coutns`.

disable_update()[source]¶

Disables update rules of all parameters under the link hierarchy.

This method sets the enabled flag of the update rule of each parameter variable to False.

enable_update()[source]¶

Enables update rules of all parameters under the link hierarchy.

This method sets the enabled flag of the update rule of each parameter variable to True.

init_scope()[source]¶

Creates an initialization scope.

This method returns a context manager object that enables registration of parameters (and links for Chain) by an assignment. A Parameter object can be automatically registered by assigning it to an attribute under this context manager.

Example

In most cases, the parameter registration is done in the initializer method. Using the init_scope method, we can simply assign a Parameter object to register it to the link.

class MyLink(chainer.Link):
    def __init__(self):
        super().__init__()
        with self.init_scope():
            self.W = chainer.Parameter(0, (10, 5))
            self.b = chainer.Parameter(0, (5,))

links(skipself=False)[source]¶

Returns a generator of all links under the hierarchy.

Parameters:	skipself (bool) – If `True`, then the generator skips this link and starts with the first child link.
Returns:	A generator object that generates all links.

namedlinks(skipself=False)[source]¶

Returns a generator of all (path, link) pairs under the hierarchy.

Parameters:	skipself (bool) – If `True`, then the generator skips this link and starts with the first child link.
Returns:	A generator object that generates all (path, link) pairs.

namedparams(include_uninit=True)[source]¶

Returns a generator of all (path, param) pairs under the hierarchy.

Parameters:	include_uninit (bool) – If `True`, it also generates uninitialized parameters.
Returns:	A generator object that generates all (path, parameter) pairs. The paths are relative from this link.

params(include_uninit=True)[source]¶

Returns a generator of all parameters under the link hierarchy.

Parameters:	include_uninit (bool) – If `True`, it also generates uninitialized parameters.
Returns:	A generator object that generates all parameters.

register_persistent(name)[source]¶

Registers an attribute of a given name as a persistent value.

This is a convenient method to register an existing attribute as a persistent value. If name has been already registered as a parameter, this method removes it from the list of parameter names and re-registers it as a persistent value.

Parameters:	name (str) – Name of the attribute to be registered.

serialize(serializer)[source]¶

Serializes the link object.

Parameters:	serializer (AbstractSerializer) – Serializer object.

to_cpu()[source]¶

to_gpu(device=None)[source]¶

zerograds()[source]¶

Initializes all gradient arrays by zero.

This method can be used for the same purpose of cleargrads, but less efficient. This method is left for backward compatibility.

Deprecated since version v1.15: Use cleargrads() instead.