51 research outputs found

    The Mechanism of Additive Composition

    Get PDF
    Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell and Lapata, 2010) is a widely used method for computing meanings of phrases, which takes the average of vector representations of the constituent words. In this article, we prove an upper bound for the bias of additive composition, which is the first theoretical analysis on compositional frameworks from a machine learning point of view. The bound is written in terms of collocation strength; we prove that the more exclusively two successive words tend to occur together, the more accurate one can guarantee their additive composition as an approximation to the natural phrase vector. Our proof relies on properties of natural language data that are empirically verified, and can be theoretically derived from an assumption that the data is generated from a Hierarchical Pitman-Yor Process. The theory endorses additive composition as a reasonable operation for calculating meanings of phrases, and suggests ways to improve additive compositionality, including: transforming entries of distributional word vectors by a function that meets a specific condition, constructing a novel type of vector representations to make additive composition sensitive to word order, and utilizing singular value decomposition to train word vectors.Comment: More explanations on theory and additional experiments added. Accepted by Machine Learning Journa

    A Deep Architecture for Semantic Parsing

    Full text link
    Many successful approaches to semantic parsing build on top of the syntactic analysis of text, and make use of distributional representations or statistical models to match parses to ontology-specific queries. This paper presents a novel deep learning architecture which provides a semantic parsing system through the union of two neural models of language semantics. It allows for the generation of ontology-specific queries from natural language statements and questions without the need for parsing, which makes it especially suitable to grammatically malformed or syntactically atypical text, such as tweets, as well as permitting the development of semantic parsers for resource-poor languages.Comment: In Proceedings of the Semantic Parsing Workshop at ACL 2014 (forthcoming

    Improving Semantic Composition with Offset Inference

    Get PDF
    Count-based distributional semantic models suffer from sparsity due to unobserved but plausible co-occurrences in any text collection. This problem is amplified for models like Anchored Packed Trees (APTs), that take the grammatical type of a co-occurrence into account. We therefore introduce a novel form of distributional inference that exploits the rich type structure in APTs and infers missing data by the same mechanism that is used for semantic composition.Comment: to appear at ACL 2017 (short papers

    Composition in distributional models of semantics

    Get PDF
    Distributional models of semantics have proven themselves invaluable both in cognitive modelling of semantic phenomena and also in practical applications. For example, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have been shown to achieve human level performance on synonymy tests (Landuaer and Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as Foreign Language (TOEFL). This ability has been put to practical use in automatic thesaurus extraction (Grefenstette, 1994). However, while there has been a considerable amount of research directed at the most effective ways of constructing representations for individual words, the representation of larger constructions, e.g., phrases and sentences, has received relatively little attention. In this thesis we examine this issue of how to compose meanings within distributional models of semantics to form representations of multi-word structures. Natural language data typically consists of such complex structures, rather than just individual isolated words. Thus, a model of composition, in which individual word meanings are combined into phrases and phrases combine to form sentences, is of central importance in modelling this data. Commonly, however, distributional representations are combined in terms of addition (Landuaer and Dumais, 1997; Foltz et al., 1998), without any empirical evaluation of alternative choices. Constructing effective distributional representations of phrases and sentences requires that we have both a theoretical foundation to direct the development of models of composition and also a means of empirically evaluating those models. The approach we take is to first consider the general properties of semantic composition and from that basis define a comprehensive framework in which to consider the composition of distributional representations. The framework subsumes existing proposals, such as addition and tensor products, but also allows us to define novel composition functions. We then show that the effectiveness of these models can be evaluated on three empirical tasks. The first of these tasks involves modelling similarity judgements for short phrases gathered in human experiments. Distributional representations of individual words are commonly evaluated on tasks based on their ability to model semantic similarity relations, e.g., synonymy or priming. Thus, it seems appropriate to evaluate phrase representations in a similar manner. We then apply compositional models to language modelling, demonstrating that the issue of composition has practical consequences, and also providing an evaluation based on large amounts of natural data. In our third task, we use these language models in an analysis of reading times from an eye-movement study. This allows us to investigate the relationship between the composition of distributional representations and the processes involved in comprehending phrases and sentences. We find that these tasks do indeed allow us to evaluate and differentiate the proposed composition functions and that the results show a reasonable consistency across tasks. In particular, a simple multiplicative model is best for a semantic space based on word co-occurrence, whereas an additive model is better for the topic based model we consider. More generally, employing compositional models to construct representations of multi-word structures typically yields improvements in performance over non-compositonal models, which only represent individual words
    • …
    corecore