910 research outputs found

    Automatic thesaurus construction

    Get PDF
    Sydney, NS

    Lambek vs. Lambek: Functorial Vector Space Semantics and String Diagrams for Lambek Calculus

    Full text link
    The Distributional Compositional Categorical (DisCoCat) model is a mathematical framework that provides compositional semantics for meanings of natural language sentences. It consists of a computational procedure for constructing meanings of sentences, given their grammatical structure in terms of compositional type-logic, and given the empirically derived meanings of their words. For the particular case that the meaning of words is modelled within a distributional vector space model, its experimental predictions, derived from real large scale data, have outperformed other empirically validated methods that could build vectors for a full sentence. This success can be attributed to a conceptually motivated mathematical underpinning, by integrating qualitative compositional type-logic and quantitative modelling of meaning within a category-theoretic mathematical framework. The type-logic used in the DisCoCat model is Lambek's pregroup grammar. Pregroup types form a posetal compact closed category, which can be passed, in a functorial manner, on to the compact closed structure of vector spaces, linear maps and tensor product. The diagrammatic versions of the equational reasoning in compact closed categories can be interpreted as the flow of word meanings within sentences. Pregroups simplify Lambek's previous type-logic, the Lambek calculus, which has been extensively used to formalise and reason about various linguistic phenomena. The apparent reliance of the DisCoCat on pregroups has been seen as a shortcoming. This paper addresses this concern, by pointing out that one may as well realise a functorial passage from the original type-logic of Lambek, a monoidal bi-closed category, to vector spaces, or to any other model of meaning organised within a monoidal bi-closed category. The corresponding string diagram calculus, due to Baez and Stay, now depicts the flow of word meanings.Comment: 29 pages, pending publication in Annals of Pure and Applied Logi

    Incremental Composition in Distributional Semantics

    Get PDF
    Despite the incremental nature of Dynamic Syntax (DS), the semantic grounding of it remains that of predicate logic, itself grounded in set theory, so is poorly suited to expressing the rampantly context-relative nature of word meaning, and related phenomena such as incremental judgements of similarity needed for the modelling of disambiguation. Here, we show how DS can be assigned a compositional distributional semantics which enables such judgements and makes it possible to incrementally disambiguate language constructs using vector space semantics. Building on a proposal in our previous work, we implement and evaluate our model on real data, showing that it outperforms a commonly used additive baseline. In conclusion, we argue that these results set the ground for an account of the non-determinism of lexical content, in which the nature of word meaning is its dependence on surrounding context for its construal

    Incremental Composition in Distributional Semantics

    Get PDF
    Despite the incremental nature of Dynamic Syntax (DS), the semantic grounding of it remains that of predicate logic, itself grounded in set theory, so is poorly suited to expressing the rampantly context-relative nature of word meaning, and related phenomena such as incremental judgements of similarity needed for the modelling of disambiguation. Here, we show how DS can be assigned a compositional distributional semantics which enables such judgements and makes it possible to incrementally disambiguate language constructs using vector space semantics. Building on a proposal in our previous work, we implement and evaluate our model on real data, showing that it outperforms a commonly used additive baseline. In conclusion, we argue that these results set the ground for an account of the non-determinism of lexical content, in which the nature of word meaning is its dependence on surrounding context for its construal

    Compositional Distributional Semantics with Compact Closed Categories and Frobenius Algebras

    Full text link
    This thesis contributes to ongoing research related to the categorical compositional model for natural language of Coecke, Sadrzadeh and Clark in three ways: Firstly, I propose a concrete instantiation of the abstract framework based on Frobenius algebras (joint work with Sadrzadeh). The theory improves shortcomings of previous proposals, extends the coverage of the language, and is supported by experimental work that improves existing results. The proposed framework describes a new class of compositional models that find intuitive interpretations for a number of linguistic phenomena. Secondly, I propose and evaluate in practice a new compositional methodology which explicitly deals with the different levels of lexical ambiguity (joint work with Pulman). A concrete algorithm is presented, based on the separation of vector disambiguation from composition in an explicit prior step. Extensive experimental work shows that the proposed methodology indeed results in more accurate composite representations for the framework of Coecke et al. in particular and every other class of compositional models in general. As a last contribution, I formalize the explicit treatment of lexical ambiguity in the context of the categorical framework by resorting to categorical quantum mechanics (joint work with Coecke). In the proposed extension, the concept of a distributional vector is replaced with that of a density matrix, which compactly represents a probability distribution over the potential different meanings of the specific word. Composition takes the form of quantum measurements, leading to interesting analogies between quantum physics and linguistics.Comment: Ph.D. Dissertation, University of Oxfor

    The role of syntactic dependencies in compositional distributional semantics

    Get PDF
    This article provides a preliminary semantic framework for Dependency Grammar in which lexical words are semantically defined as contextual distributions (sets of contexts) while syntactic dependencies are compositional operations on word distributions. More precisely, any syntactic dependency uses the contextual distribution of the dependent word to restrict the distribution of the head, and makes use of the contextual distribution of the head to restrict that of the dependent word. The interpretation of composite expressions and sentences, which are analyzed as a tree of binary dependencies, is performed by restricting the contexts of words dependency by dependency in a left-to-right incremental way. Consequently, the meaning of the whole composite expression or sentence is not a single representation, but a list of contextualized senses, namely the restricted distributions of its constituent (lexical) words. We report the results of two large-scale corpus-based experiments on two different natural language processing applications: paraphrasing and compositional translationThis work is funded by Project TELPARES, Ministry of Economy and Competitiveness (FFI2014-51978-C2-1-R), and the program “Ayuda Fundación BBVA a Investigadores y Creadores Culturales 2016”S

    Categorical Vector Space Semantics for Lambek Calculus with a Relevant Modality (Extended Abstract)

    Get PDF
    We develop a categorical compositional distributional semantics for Lambek Calculus with a Relevant Modality, which has a limited version of the contraction and permutation rules. The categorical part of the semantics is a monoidal biclosed category with a coalgebra modality as defined on Differential Categories. We instantiate this category to finite dimensional vector spaces and linear maps via quantisation functors and work with three concrete interpretations of the coalgebra modality. We apply the model to construct categorical and concrete semantic interpretations for the motivating example of this extended calculus: the derivation of a phrase with a parasitic gap. The effectiveness of the concrete interpretations are evaluated via a disambiguation task, on an extension of a sentence disambiguation dataset to parasitic gap phrases, using BERT, Word2Vec, and FastText vectors and Relational tensorsComment: In Proceedings ACT 2020, arXiv:2101.07888. arXiv admin note: substantial text overlap with arXiv:2005.0307
    corecore