2,620 research outputs found
Mathematical Foundations for a Compositional Distributional Model of Meaning
We propose a mathematical framework for a unification of the distributional
theory of meaning in terms of vector space models, and a compositional theory
for grammatical types, for which we rely on the algebra of Pregroups,
introduced by Lambek. This mathematical framework enables us to compute the
meaning of a well-typed sentence from the meanings of its constituents.
Concretely, the type reductions of Pregroups are `lifted' to morphisms in a
category, a procedure that transforms meanings of constituents into a meaning
of the (well-typed) whole. Importantly, meanings of whole sentences live in a
single space, independent of the grammatical structure of the sentence. Hence
the inner-product can be used to compare meanings of arbitrary sentences, as it
is for comparing the meanings of words in the distributional model. The
mathematical structure we employ admits a purely diagrammatic calculus which
exposes how the information flows between the words in a sentence in order to
make up the meaning of the whole sentence. A variation of our `categorical
model' which involves constraining the scalars of the vector spaces to the
semiring of Booleans results in a Montague-style Boolean-valued semantics.Comment: to appea
"Not not bad" is not "bad": A distributional account of negation
With the increasing empirical success of distributional models of
compositional semantics, it is timely to consider the types of textual logic
that such models are capable of capturing. In this paper, we address
shortcomings in the ability of current models to capture logical operations
such as negation. As a solution we propose a tripartite formulation for a
continuous vector space representation of semantics and subsequently use this
representation to develop a formal compositional notion of negation within such
models.Comment: 9 pages, to appear in Proceedings of the 2013 Workshop on Continuous
Vector Space Models and their Compositionalit
Lambek vs. Lambek: Functorial Vector Space Semantics and String Diagrams for Lambek Calculus
The Distributional Compositional Categorical (DisCoCat) model is a
mathematical framework that provides compositional semantics for meanings of
natural language sentences. It consists of a computational procedure for
constructing meanings of sentences, given their grammatical structure in terms
of compositional type-logic, and given the empirically derived meanings of
their words. For the particular case that the meaning of words is modelled
within a distributional vector space model, its experimental predictions,
derived from real large scale data, have outperformed other empirically
validated methods that could build vectors for a full sentence. This success
can be attributed to a conceptually motivated mathematical underpinning, by
integrating qualitative compositional type-logic and quantitative modelling of
meaning within a category-theoretic mathematical framework.
The type-logic used in the DisCoCat model is Lambek's pregroup grammar.
Pregroup types form a posetal compact closed category, which can be passed, in
a functorial manner, on to the compact closed structure of vector spaces,
linear maps and tensor product. The diagrammatic versions of the equational
reasoning in compact closed categories can be interpreted as the flow of word
meanings within sentences. Pregroups simplify Lambek's previous type-logic, the
Lambek calculus, which has been extensively used to formalise and reason about
various linguistic phenomena. The apparent reliance of the DisCoCat on
pregroups has been seen as a shortcoming. This paper addresses this concern, by
pointing out that one may as well realise a functorial passage from the
original type-logic of Lambek, a monoidal bi-closed category, to vector spaces,
or to any other model of meaning organised within a monoidal bi-closed
category. The corresponding string diagram calculus, due to Baez and Stay, now
depicts the flow of word meanings.Comment: 29 pages, pending publication in Annals of Pure and Applied Logi
Translating and Evolving: Towards a Model of Language Change in DisCoCat
The categorical compositional distributional (DisCoCat) model of meaning
developed by Coecke et al. (2010) has been successful in modeling various
aspects of meaning. However, it fails to model the fact that language can
change. We give an approach to DisCoCat that allows us to represent language
models and translations between them, enabling us to describe translations from
one language to another, or changes within the same language. We unify the
product space representation given in (Coecke et al., 2010) and the functorial
description in (Kartsaklis et al., 2013), in a way that allows us to view a
language as a catalogue of meanings. We formalize the notion of a lexicon in
DisCoCat, and define a dictionary of meanings between two lexicons. All this is
done within the framework of monoidal categories. We give examples of how to
apply our methods, and give a concrete suggestion for compositional translation
in corpora.Comment: In Proceedings CAPNS 2018, arXiv:1811.0270
Experimenting with Transitive Verbs in a DisCoCat
Formal and distributional semantic models offer complementary benefits in
modeling meaning. The categorical compositional distributional (DisCoCat) model
of meaning of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) combines aspected of
both to provide a general framework in which meanings of words, obtained
distributionally, are composed using methods from the logical setting to form
sentence meaning. Concrete consequences of this general abstract setting and
applications to empirical data are under active study (Grefenstette et al.,
arxiv:1101.0309; Grefenstette and Sadrzadeh, arXiv:1106.4058v1 [cs.CL]). . In
this paper, we extend this study by examining transitive verbs, represented as
matrices in a DisCoCat. We discuss three ways of constructing such matrices,
and evaluate each method in a disambiguation task developed by Grefenstette and
Sadrzadeh (arXiv:1106.4058v1 [cs.CL]).Comment: 5 pages, to be presented at GEMS 2011, as part of EMNLP'11 workshop
A Proof-Theoretic Approach to Scope Ambiguity in Compositional Vector Space Models
We investigate the extent to which compositional vector space models can be
used to account for scope ambiguity in quantified sentences (of the form "Every
man loves some woman"). Such sentences containing two quantifiers introduce two
readings, a direct scope reading and an inverse scope reading. This ambiguity
has been treated in a vector space model using bialgebras by (Hedges and
Sadrzadeh, 2016) and (Sadrzadeh, 2016), though without an explanation of the
mechanism by which the ambiguity arises. We combine a polarised focussed
sequent calculus for the non-associative Lambek calculus NL, as described in
(Moortgat and Moot, 2011), with the vector based approach to quantifier scope
ambiguity. In particular, we establish a procedure for obtaining a vector space
model for quantifier scope ambiguity in a derivational way.Comment: This is a preprint of a paper to appear in: Journal of Language
Modelling, 201
- …