9 research outputs found

    A Proof-Theoretic Approach to Scope Ambiguity in Compositional Vector Space Models

    Full text link
    We investigate the extent to which compositional vector space models can be used to account for scope ambiguity in quantified sentences (of the form "Every man loves some woman"). Such sentences containing two quantifiers introduce two readings, a direct scope reading and an inverse scope reading. This ambiguity has been treated in a vector space model using bialgebras by (Hedges and Sadrzadeh, 2016) and (Sadrzadeh, 2016), though without an explanation of the mechanism by which the ambiguity arises. We combine a polarised focussed sequent calculus for the non-associative Lambek calculus NL, as described in (Moortgat and Moot, 2011), with the vector based approach to quantifier scope ambiguity. In particular, we establish a procedure for obtaining a vector space model for quantifier scope ambiguity in a derivational way.Comment: This is a preprint of a paper to appear in: Journal of Language Modelling, 201

    Comparing and evaluating extended Lambek calculi

    Get PDF
    Lambeks Syntactic Calculus, commonly referred to as the Lambek calculus, was innovative in many ways, notably as a precursor of linear logic. But it also showed that we could treat our grammatical framework as a logic (as opposed to a logical theory). However, though it was successful in giving at least a basic treatment of many linguistic phenomena, it was also clear that a slightly more expressive logical calculus was needed for many other cases. Therefore, many extensions and variants of the Lambek calculus have been proposed, since the eighties and up until the present day. As a result, there is now a large class of calculi, each with its own empirical successes and theoretical results, but also each with its own logical primitives. This raises the question: how do we compare and evaluate these different logical formalisms? To answer this question, I present two unifying frameworks for these extended Lambek calculi. Both are proof net calculi with graph contraction criteria. The first calculus is a very general system: you specify the structure of your sequents and it gives you the connectives and contractions which correspond to it. The calculus can be extended with structural rules, which translate directly into graph rewrite rules. The second calculus is first-order (multiplicative intuitionistic) linear logic, which turns out to have several other, independently proposed extensions of the Lambek calculus as fragments. I will illustrate the use of each calculus in building bridges between analyses proposed in different frameworks, in highlighting differences and in helping to identify problems.Comment: Empirical advances in categorial grammars, Aug 2015, Barcelona, Spain. 201

    Lambek vs. Lambek: Functorial Vector Space Semantics and String Diagrams for Lambek Calculus

    Full text link
    The Distributional Compositional Categorical (DisCoCat) model is a mathematical framework that provides compositional semantics for meanings of natural language sentences. It consists of a computational procedure for constructing meanings of sentences, given their grammatical structure in terms of compositional type-logic, and given the empirically derived meanings of their words. For the particular case that the meaning of words is modelled within a distributional vector space model, its experimental predictions, derived from real large scale data, have outperformed other empirically validated methods that could build vectors for a full sentence. This success can be attributed to a conceptually motivated mathematical underpinning, by integrating qualitative compositional type-logic and quantitative modelling of meaning within a category-theoretic mathematical framework. The type-logic used in the DisCoCat model is Lambek's pregroup grammar. Pregroup types form a posetal compact closed category, which can be passed, in a functorial manner, on to the compact closed structure of vector spaces, linear maps and tensor product. The diagrammatic versions of the equational reasoning in compact closed categories can be interpreted as the flow of word meanings within sentences. Pregroups simplify Lambek's previous type-logic, the Lambek calculus, which has been extensively used to formalise and reason about various linguistic phenomena. The apparent reliance of the DisCoCat on pregroups has been seen as a shortcoming. This paper addresses this concern, by pointing out that one may as well realise a functorial passage from the original type-logic of Lambek, a monoidal bi-closed category, to vector spaces, or to any other model of meaning organised within a monoidal bi-closed category. The corresponding string diagram calculus, due to Baez and Stay, now depicts the flow of word meanings.Comment: 29 pages, pending publication in Annals of Pure and Applied Logi

    Continuation semantics for the Lambek-Grishin calculus

    Get PDF
    Categorial grammars in the tradition of Lambek [18, 19] are asymmetric: sequent statements are of the form Γ ⇒ A, where the succedent is a single formula A, the antecedent a structured configuration of formulas A1,..., An. The absence of structural context in the succedent makes the analysis of a number of phenomena in natural language semantics problematic. A case in point is scope construal: the different possibilities to build an interpretation for sentences containing generalized quantifiers and related expressions. In this paper, we explore a symmetric version of categorial grammar, based on work by Grishin [15]. In addition to the Lambek product, left and right division, we consider a dual family of type-forming operations: coproduct, left and right difference. Communication between the two families is established by means of structurepreserving distributivity principles. We call the resulting system LG. We present a Curry-Howard interpretation for LG(/, \, ⦸, ⊘) derivations, based on Curien and Herbelin’s λµ˜µ calculus [10]. We discuss continuation-passing-style (CPS) translations mapping LG derivations to proofs/terms of Intuitionistic Multiplicative Linear Logic — the categorial system LP which serves as the logic for natural language meaning assembly. We show how LG, thus interpreted, associates sentence

    Apprentissage de grammaires catégorielles (transducteurs d'arbres et clustering pour induction de grammaires catégorielles)

    Get PDF
    De nos jours, il n est pas rare d utiliser des logiciels capables d avoir une conversation, d interagir avec nous (systèmes questions/réponses pour les SAV, gestion d interface ou simplement Intelligence Artificielle - IA - de discussion). Ceux-ci doivent comprendre le contexte ou réagir par mot-clefs, mais générer ensuite des réponses cohérentes, aussi bien au niveau du sens de la phrase (sémantique) que de la forme (syntaxe). Si les premières IA se contentaient de phrases toutes faites et réagissaient en fonction de mots-clefs, le processus s est complexifié avec le temps. Pour améliorer celui-ci, il faut comprendre et étudier la construction des phrases. Nous nous focalisons sur la syntaxe et sa modélisation avec des grammaires catégorielles. L idée est de pouvoir aussi bien générer des squelettes de phrases syntaxiquement correctes que vérifier l appartenance d une phrase à un langage, ici le français (il manque l aspect sémantique). On note que les grammaires AB peuvent, à l exception de certains phénomènes comme la quantification et l extraction, servir de base pour la sémantique en extrayant des -termes. Nous couvrons aussi bien l aspect d extraction de grammaire à partir de corpus arborés que l analyse de phrases. Pour ce faire, nous présentons deux méthodes d extraction et une méthode d analyse de phrases permettant de tester nos grammaires. La première méthode consiste en la création d un transducteur d arbres généralisé, qui transforme les arbres syntaxiques en arbres de dérivation d une grammaire AB. Appliqué sur les corpus français que nous avons à notre disposition, il permet d avoir une grammaire assez complète de la langue française, ainsi qu un vaste lexique. Le transducteur, même s il s éloigne peu de la définition usuelle d un transducteur descendant, a pour particularité d offrir une nouvelle méthode d écriture des règles de transduction, permettant une définition compacte de celles-ci. Nous transformons actuellement 92,5% des corpus en arbres de dérivation. Pour notre seconde méthode, nous utilisons un algorithme d unification en guidant celui-ci avec une étape préliminaire de clustering, qui rassemble les mots en fonction de leur contexte dans la phrase. La comparaison avec les arbres extraits du transducteur donne des résultats encourageants avec 91,3% de similarité. Enfin, nous mettons en place une version probabiliste de l algorithme CYK pour tester l efficacité de nos grammaires en analyse de phrases. La couverture obtenue est entre 84,6% et 92,6%, en fonction de l ensemble de phrases pris en entrée. Les probabilités, appliquées aussi bien sur le type des mots lorsque ceux-ci en ont plusieurs que sur les règles, permettent de sélectionner uniquement le meilleur arbre de dérivation.Tous nos logiciels sont disponibles au téléchargement sous licence GNU GPL.Nowadays, we have become familiar with software interacting with us using natural language (for example in question-answering systems for after-sale services, human-computer interaction or simple discussion bots). These tools have to either react by keyword extraction or, more ambitiously, try to understand the sentence in its context. Though the simplest of these programs only have a set of pre-programmed sentences to react to recognized keywords (these systems include Eliza but also more modern systems like Siri), more sophisticated systems make an effort to understand the structure and the meaning of sentences (these include systems like Watson), allowing them to generate consistent answers, both with respect to the meaning of the sentence (semantics) and with respect to its form (syntax). In this thesis, we focus on syntax and on how to model syntax using categorial grammars. Our goal is to generate syntactically accurate sentences (without the semantic aspect) and to verify that a given sentence belongs to a language - the French language. We note that AB grammars, with the exception of some phenomena like quantification or extraction, are also a good basis for semantic purposes. We cover both grammar extraction from treebanks and parsing using the extracted grammars. On this purpose, we present two extraction methods and test the resulting grammars using standard parsing algorithms. The first method focuses on creating a generalized tree transducer, which transforms syntactic trees into derivation trees corresponding to an AB grammar. Applied on the various French treebanks, the transducer s output gives us a wide-coverage lexicon and a grammar suitable for parsing. The transducer, even if it differs only slightly from the usual definition of a top-down transducer, offers several new, compact ways to express transduction rules. We currently transduce 92.5% of all sen- tences in the treebanks into derivation trees.For our second method, we use a unification algorithm, guiding it with a preliminary clustering step, which gathers the words according to their context in the sentence. The comparision between the transduced trees and this method gives the promising result of 91.3% of similarity.Finally, we have tested our grammars on sentence analysis with a probabilistic CYK algorithm and a formula assignment step done with a supertagger. The obtained coverage lies between 84.6% and 92.6%, depending on the input corpus. The probabilities, estimated for the type of words and for the rules, enable us to select only the best derivation tree. All our software is available for download under GNU GPL licence.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF
    corecore