5 research outputs found
Character n-gram Embeddings to Improve RNN Language Models
This paper proposes a novel Recurrent Neural Network (RNN) language model
that takes advantage of character information. We focus on character n-grams
based on research in the field of word embedding construction (Wieting et al.
2016). Our proposed method constructs word embeddings from character n-gram
embeddings and combines them with ordinary word embeddings. We demonstrate that
the proposed method achieves the best perplexities on the language modeling
datasets: Penn Treebank, WikiText-2, and WikiText-103. Moreover, we conduct
experiments on application tasks: machine translation and headline generation.
The experimental results indicate that our proposed method also positively
affects these tasks.Comment: AAAI 2019 pape
The Mechanism of Additive Composition
Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell
and Lapata, 2010) is a widely used method for computing meanings of phrases,
which takes the average of vector representations of the constituent words. In
this article, we prove an upper bound for the bias of additive composition,
which is the first theoretical analysis on compositional frameworks from a
machine learning point of view. The bound is written in terms of collocation
strength; we prove that the more exclusively two successive words tend to occur
together, the more accurate one can guarantee their additive composition as an
approximation to the natural phrase vector. Our proof relies on properties of
natural language data that are empirically verified, and can be theoretically
derived from an assumption that the data is generated from a Hierarchical
Pitman-Yor Process. The theory endorses additive composition as a reasonable
operation for calculating meanings of phrases, and suggests ways to improve
additive compositionality, including: transforming entries of distributional
word vectors by a function that meets a specific condition, constructing a
novel type of vector representations to make additive composition sensitive to
word order, and utilizing singular value decomposition to train word vectors.Comment: More explanations on theory and additional experiments added.
Accepted by Machine Learning Journa