498 research outputs found
Mathematical Foundations for a Compositional Distributional Model of Meaning
We propose a mathematical framework for a unification of the distributional
theory of meaning in terms of vector space models, and a compositional theory
for grammatical types, for which we rely on the algebra of Pregroups,
introduced by Lambek. This mathematical framework enables us to compute the
meaning of a well-typed sentence from the meanings of its constituents.
Concretely, the type reductions of Pregroups are `lifted' to morphisms in a
category, a procedure that transforms meanings of constituents into a meaning
of the (well-typed) whole. Importantly, meanings of whole sentences live in a
single space, independent of the grammatical structure of the sentence. Hence
the inner-product can be used to compare meanings of arbitrary sentences, as it
is for comparing the meanings of words in the distributional model. The
mathematical structure we employ admits a purely diagrammatic calculus which
exposes how the information flows between the words in a sentence in order to
make up the meaning of the whole sentence. A variation of our `categorical
model' which involves constraining the scalars of the vector spaces to the
semiring of Booleans results in a Montague-style Boolean-valued semantics.Comment: to appea
Semantic Unification A sheaf theoretic approach to natural language
Language is contextual and sheaf theory provides a high level mathematical
framework to model contextuality. We show how sheaf theory can model the
contextual nature of natural language and how gluing can be used to provide a
global semantics for a discourse by putting together the local logical
semantics of each sentence within the discourse. We introduce a presheaf
structure corresponding to a basic form of Discourse Representation Structures.
Within this setting, we formulate a notion of semantic unification --- gluing
meanings of parts of a discourse into a coherent whole --- as a form of
sheaf-theoretic gluing. We illustrate this idea with a number of examples where
it can used to represent resolutions of anaphoric references. We also discuss
multivalued gluing, described using a distributions functor, which can be used
to represent situations where multiple gluings are possible, and where we may
need to rank them using quantitative measures.
Dedicated to Jim Lambek on the occasion of his 90th birthday.Comment: 12 page
A Proof-Theoretic Approach to Scope Ambiguity in Compositional Vector Space Models
We investigate the extent to which compositional vector space models can be
used to account for scope ambiguity in quantified sentences (of the form "Every
man loves some woman"). Such sentences containing two quantifiers introduce two
readings, a direct scope reading and an inverse scope reading. This ambiguity
has been treated in a vector space model using bialgebras by (Hedges and
Sadrzadeh, 2016) and (Sadrzadeh, 2016), though without an explanation of the
mechanism by which the ambiguity arises. We combine a polarised focussed
sequent calculus for the non-associative Lambek calculus NL, as described in
(Moortgat and Moot, 2011), with the vector based approach to quantifier scope
ambiguity. In particular, we establish a procedure for obtaining a vector space
model for quantifier scope ambiguity in a derivational way.Comment: This is a preprint of a paper to appear in: Journal of Language
Modelling, 201
A Generalised Quantifier Theory of Natural Language in Categorical Compositional Distributional Semantics with Bialgebras
Categorical compositional distributional semantics is a model of natural
language; it combines the statistical vector space models of words with the
compositional models of grammar. We formalise in this model the generalised
quantifier theory of natural language, due to Barwise and Cooper. The
underlying setting is a compact closed category with bialgebras. We start from
a generative grammar formalisation and develop an abstract categorical
compositional semantics for it, then instantiate the abstract setting to sets
and relations and to finite dimensional vector spaces and linear maps. We prove
the equivalence of the relational instantiation to the truth theoretic
semantics of generalised quantifiers. The vector space instantiation formalises
the statistical usages of words and enables us to, for the first time, reason
about quantified phrases and sentences compositionally in distributional
semantics
Lambek vs. Lambek: Functorial Vector Space Semantics and String Diagrams for Lambek Calculus
The Distributional Compositional Categorical (DisCoCat) model is a
mathematical framework that provides compositional semantics for meanings of
natural language sentences. It consists of a computational procedure for
constructing meanings of sentences, given their grammatical structure in terms
of compositional type-logic, and given the empirically derived meanings of
their words. For the particular case that the meaning of words is modelled
within a distributional vector space model, its experimental predictions,
derived from real large scale data, have outperformed other empirically
validated methods that could build vectors for a full sentence. This success
can be attributed to a conceptually motivated mathematical underpinning, by
integrating qualitative compositional type-logic and quantitative modelling of
meaning within a category-theoretic mathematical framework.
The type-logic used in the DisCoCat model is Lambek's pregroup grammar.
Pregroup types form a posetal compact closed category, which can be passed, in
a functorial manner, on to the compact closed structure of vector spaces,
linear maps and tensor product. The diagrammatic versions of the equational
reasoning in compact closed categories can be interpreted as the flow of word
meanings within sentences. Pregroups simplify Lambek's previous type-logic, the
Lambek calculus, which has been extensively used to formalise and reason about
various linguistic phenomena. The apparent reliance of the DisCoCat on
pregroups has been seen as a shortcoming. This paper addresses this concern, by
pointing out that one may as well realise a functorial passage from the
original type-logic of Lambek, a monoidal bi-closed category, to vector spaces,
or to any other model of meaning organised within a monoidal bi-closed
category. The corresponding string diagram calculus, due to Baez and Stay, now
depicts the flow of word meanings.Comment: 29 pages, pending publication in Annals of Pure and Applied Logi
- …