51 research outputs found
The Mechanism of Additive Composition
Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell
and Lapata, 2010) is a widely used method for computing meanings of phrases,
which takes the average of vector representations of the constituent words. In
this article, we prove an upper bound for the bias of additive composition,
which is the first theoretical analysis on compositional frameworks from a
machine learning point of view. The bound is written in terms of collocation
strength; we prove that the more exclusively two successive words tend to occur
together, the more accurate one can guarantee their additive composition as an
approximation to the natural phrase vector. Our proof relies on properties of
natural language data that are empirically verified, and can be theoretically
derived from an assumption that the data is generated from a Hierarchical
Pitman-Yor Process. The theory endorses additive composition as a reasonable
operation for calculating meanings of phrases, and suggests ways to improve
additive compositionality, including: transforming entries of distributional
word vectors by a function that meets a specific condition, constructing a
novel type of vector representations to make additive composition sensitive to
word order, and utilizing singular value decomposition to train word vectors.Comment: More explanations on theory and additional experiments added.
Accepted by Machine Learning Journa
A Deep Architecture for Semantic Parsing
Many successful approaches to semantic parsing build on top of the syntactic
analysis of text, and make use of distributional representations or statistical
models to match parses to ontology-specific queries. This paper presents a
novel deep learning architecture which provides a semantic parsing system
through the union of two neural models of language semantics. It allows for the
generation of ontology-specific queries from natural language statements and
questions without the need for parsing, which makes it especially suitable to
grammatically malformed or syntactically atypical text, such as tweets, as well
as permitting the development of semantic parsers for resource-poor languages.Comment: In Proceedings of the Semantic Parsing Workshop at ACL 2014
(forthcoming
Improving Semantic Composition with Offset Inference
Count-based distributional semantic models suffer from sparsity due to
unobserved but plausible co-occurrences in any text collection. This problem is
amplified for models like Anchored Packed Trees (APTs), that take the
grammatical type of a co-occurrence into account. We therefore introduce a
novel form of distributional inference that exploits the rich type structure in
APTs and infers missing data by the same mechanism that is used for semantic
composition.Comment: to appear at ACL 2017 (short papers
Composition in distributional models of semantics
Distributional models of semantics have proven themselves invaluable both in cognitive
modelling of semantic phenomena and also in practical applications. For example,
they have been used to model judgments of semantic similarity (McDonald,
2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have
been shown to achieve human level performance on synonymy tests (Landuaer and
Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as
Foreign Language (TOEFL). This ability has been put to practical use in automatic thesaurus
extraction (Grefenstette, 1994). However, while there has been a considerable
amount of research directed at the most effective ways of constructing representations
for individual words, the representation of larger constructions, e.g., phrases and sentences,
has received relatively little attention. In this thesis we examine this issue of
how to compose meanings within distributional models of semantics to form representations
of multi-word structures.
Natural language data typically consists of such complex structures, rather than
just individual isolated words. Thus, a model of composition, in which individual
word meanings are combined into phrases and phrases combine to form sentences,
is of central importance in modelling this data. Commonly, however, distributional
representations are combined in terms of addition (Landuaer and Dumais, 1997; Foltz
et al., 1998), without any empirical evaluation of alternative choices. Constructing
effective distributional representations of phrases and sentences requires that we have
both a theoretical foundation to direct the development of models of composition and
also a means of empirically evaluating those models.
The approach we take is to first consider the general properties of semantic composition
and from that basis define a comprehensive framework in which to consider
the composition of distributional representations. The framework subsumes existing
proposals, such as addition and tensor products, but also allows us to define novel
composition functions. We then show that the effectiveness of these models can be evaluated on three empirical tasks.
The first of these tasks involves modelling similarity judgements for short phrases
gathered in human experiments. Distributional representations of individual words are
commonly evaluated on tasks based on their ability to model semantic similarity relations,
e.g., synonymy or priming. Thus, it seems appropriate to evaluate phrase representations
in a similar manner. We then apply compositional models to language modelling,
demonstrating that the issue of composition has practical consequences, and
also providing an evaluation based on large amounts of natural data. In our third task,
we use these language models in an analysis of reading times from an eye-movement
study. This allows us to investigate the relationship between the composition of distributional
representations and the processes involved in comprehending phrases and
sentences.
We find that these tasks do indeed allow us to evaluate and differentiate the proposed
composition functions and that the results show a reasonable consistency across
tasks. In particular, a simple multiplicative model is best for a semantic space based
on word co-occurrence, whereas an additive model is better for the topic based model
we consider. More generally, employing compositional models to construct representations
of multi-word structures typically yields improvements in performance over
non-compositonal models, which only represent individual words
- …