12 research outputs found
A Corpus-based Toy Model for DisCoCat
The categorical compositional distributional (DisCoCat) model of meaning
rigorously connects distributional semantics and pregroup grammars, and has
found a variety of applications in computational linguistics. From a more
abstract standpoint, the DisCoCat paradigm predicates the construction of a
mapping from syntax to categorical semantics. In this work we present a
concrete construction of one such mapping, from a toy model of syntax for
corpora annotated with constituent structure trees, to categorical semantics
taking place in a category of free R-semimodules over an involutive commutative
semiring R.Comment: In Proceedings SLPCS 2016, arXiv:1608.0101
Translating and Evolving: Towards a Model of Language Change in DisCoCat
The categorical compositional distributional (DisCoCat) model of meaning
developed by Coecke et al. (2010) has been successful in modeling various
aspects of meaning. However, it fails to model the fact that language can
change. We give an approach to DisCoCat that allows us to represent language
models and translations between them, enabling us to describe translations from
one language to another, or changes within the same language. We unify the
product space representation given in (Coecke et al., 2010) and the functorial
description in (Kartsaklis et al., 2013), in a way that allows us to view a
language as a catalogue of meanings. We formalize the notion of a lexicon in
DisCoCat, and define a dictionary of meanings between two lexicons. All this is
done within the framework of monoidal categories. We give examples of how to
apply our methods, and give a concrete suggestion for compositional translation
in corpora.Comment: In Proceedings CAPNS 2018, arXiv:1811.0270
A Study of Entanglement in a Categorical Framework of Natural Language
In both quantum mechanics and corpus linguistics based on vector spaces, the
notion of entanglement provides a means for the various subsystems to
communicate with each other. In this paper we examine a number of
implementations of the categorical framework of Coecke, Sadrzadeh and Clark
(2010) for natural language, from an entanglement perspective. Specifically,
our goal is to better understand in what way the level of entanglement of the
relational tensors (or the lack of it) affects the compositional structures in
practical situations. Our findings reveal that a number of proposals for verb
construction lead to almost separable tensors, a fact that considerably
simplifies the interactions between the words. We examine the ramifications of
this fact, and we show that the use of Frobenius algebras mitigates the
potential problems to a great extent. Finally, we briefly examine a machine
learning method that creates verb tensors exhibiting a sufficient level of
entanglement.Comment: In Proceedings QPL 2014, arXiv:1412.810
A Generalised Quantifier Theory of Natural Language in Categorical Compositional Distributional Semantics with Bialgebras
Categorical compositional distributional semantics is a model of natural
language; it combines the statistical vector space models of words with the
compositional models of grammar. We formalise in this model the generalised
quantifier theory of natural language, due to Barwise and Cooper. The
underlying setting is a compact closed category with bialgebras. We start from
a generative grammar formalisation and develop an abstract categorical
compositional semantics for it, then instantiate the abstract setting to sets
and relations and to finite dimensional vector spaces and linear maps. We prove
the equivalence of the relational instantiation to the truth theoretic
semantics of generalised quantifiers. The vector space instantiation formalises
the statistical usages of words and enables us to, for the first time, reason
about quantified phrases and sentences compositionally in distributional
semantics
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa