1,602 research outputs found
A Context-theoretic Framework for Compositionality in Distributional Semantics
Techniques in which words are represented as vectors have proved useful in
many applications in computational linguistics, however there is currently no
general semantic formalism for representing meaning in terms of vectors. We
present a framework for natural language semantics in which words, phrases and
sentences are all represented as vectors, based on a theoretical analysis which
assumes that meaning is determined by context.
In the theoretical analysis, we define a corpus model as a mathematical
abstraction of a text corpus. The meaning of a string of words is assumed to be
a vector representing the contexts in which it occurs in the corpus model.
Based on this assumption, we can show that the vector representations of words
can be considered as elements of an algebra over a field. We note that in
applications of vector spaces to representing meanings of words there is an
underlying lattice structure; we interpret the partial ordering of the lattice
as describing entailment between meanings. We also define the context-theoretic
probability of a string, and, based on this and the lattice structure, a degree
of entailment between strings.
We relate the framework to existing methods of composing vector-based
representations of meaning, and show that our approach generalises many of
these, including vector addition, component-wise multiplication, and the tensor
product.Comment: Submitted to Computational Linguistics on 20th January 2010 for
revie
Distributional Sentence Entailment Using Density Matrices
Categorical compositional distributional model of Coecke et al. (2010)
suggests a way to combine grammatical composition of the formal, type logical
models with the corpus based, empirical word representations of distributional
semantics. This paper contributes to the project by expanding the model to also
capture entailment relations. This is achieved by extending the representations
of words from points in meaning space to density operators, which are
probability distributions on the subspaces of the space. A symmetric measure of
similarity and an asymmetric measure of entailment is defined, where lexical
entailment is measured using von Neumann entropy, the quantum variant of
Kullback-Leibler divergence. Lexical entailment, combined with the composition
map on word representations, provides a method to obtain entailment relations
on the level of sentences. Truth theoretic and corpus-based examples are
provided.Comment: 11 page
A Generalised Quantifier Theory of Natural Language in Categorical Compositional Distributional Semantics with Bialgebras
Categorical compositional distributional semantics is a model of natural
language; it combines the statistical vector space models of words with the
compositional models of grammar. We formalise in this model the generalised
quantifier theory of natural language, due to Barwise and Cooper. The
underlying setting is a compact closed category with bialgebras. We start from
a generative grammar formalisation and develop an abstract categorical
compositional semantics for it, then instantiate the abstract setting to sets
and relations and to finite dimensional vector spaces and linear maps. We prove
the equivalence of the relational instantiation to the truth theoretic
semantics of generalised quantifiers. The vector space instantiation formalises
the statistical usages of words and enables us to, for the first time, reason
about quantified phrases and sentences compositionally in distributional
semantics
Measuring Thematic Fit with Distributional Feature Overlap
In this paper, we introduce a new distributional method for modeling
predicate-argument thematic fit judgments. We use a syntax-based DSM to build a
prototypical representation of verb-specific roles: for every verb, we extract
the most salient second order contexts for each of its roles (i.e. the most
salient dimensions of typical role fillers), and then we compute thematic fit
as a weighted overlap between the top features of candidate fillers and role
prototypes. Our experiments show that our method consistently outperforms a
baseline re-implementing a state-of-the-art system, and achieves better or
comparable results to those reported in the literature for the other
unsupervised systems. Moreover, it provides an explicit representation of the
features characterizing verb-specific semantic roles.Comment: 9 pages, 2 figures, 5 tables, EMNLP, 2017, thematic fit, selectional
preference, semantic role, DSMs, Distributional Semantic Models, Vector Space
Models, VSMs, cosine, APSyn, similarity, prototyp
Greedy approximation of high-dimensional Ornstein-Uhlenbeck operators with unbounded drift
We investigate the convergence of a nonlinear approximation method introduced by Ammar et al. (cf. J. Non-Newtonian Fluid Mech. 139:153--176, 2006) for the numerical solution of high-dimensional Fokker--Planck equations featuring in Navier--Stokes--Fokker--Planck systems that arise in kinetic models of dilute polymers. In the case of Poisson's equation on a rectangular domain in , subject to a homogeneous Dirichlet boundary condition, the mathematical analysis of the algorithm was carried out recently by Le Bris, Leli\`evre and Maday (Const. Approx. 30: 621--651, 2009), by exploiting its connection to greedy algorithms from nonlinear approximation theory explored, for example, by DeVore and Temlyakov (Adv. Comput. Math. 5:173--187, 1996); hence, the variational version of the algorithm, based on the minimization of a sequence of Dirichlet energies, was shown to converge. In this paper, we extend the convergence analysis of the pure greedy and orthogonal greedy algorithms considered by Le Bris, Leli\`evre and Maday to the technically more complicated case where the Laplace operator is replaced by a high-dimensional Ornstein--Uhlenbeck operator with unbounded drift, of the kind that appears in Fokker--Planck equations that arise in bead-spring chain type kinetic polymer models with finitely extensible nonlinear elastic potentials, posed on a high-dimensional Cartesian product configuration space D = D_1 x ... x D_N contained in , where each set D_i, i=1,...,N, is a bounded open ball in , d = 2, 3
A Frobenius Algebraic Analysis for Parasitic Gaps
The interpretation of parasitic gaps is an ostensible case of non-linearity
in natural language composition. Existing categorial analyses, both in the
typelogical and in the combinatory traditions, rely on explicit forms of
syntactic copying. We identify two types of parasitic gapping where the
duplication of semantic content can be confined to the lexicon. Parasitic gaps
in adjuncts are analysed as forms of generalized coordination with a
polymorphic type schema for the head of the adjunct phrase. For parasitic gaps
affecting arguments of the same predicate, the polymorphism is associated with
the lexical item that introduces the primary gap. Our analysis is formulated in
terms of Lambek calculus extended with structural control modalities. A
compositional translation relates syntactic types and derivations to the
interpreting compact closed category of finite dimensional vector spaces and
linear maps with Frobenius algebras over it. When interpreted over the
necessary semantic spaces, the Frobenius algebras provide the tools to model
the proposed instances of lexical polymorphism.Comment: SemSpace 2019, to appear in Journal of Applied Logic
Sentence entailment in compositional distributional semantics
Distributional semantic models provide vector representations for words by
gathering co-occurrence frequencies from corpora of text. Compositional
distributional models extend these from words to phrases and sentences. In
categorical compositional distributional semantics, phrase and sentence
representations are functions of their grammatical structure and
representations of the words therein. In this setting, grammatical structures
are formalised by morphisms of a compact closed category and meanings of words
are formalised by objects of the same category. These can be instantiated in
the form of vectors or density matrices. This paper concerns the applications
of this model to phrase and sentence level entailment. We argue that
entropy-based distances of vectors and density matrices provide a good
candidate to measure word-level entailment, show the advantage of density
matrices over vectors for word level entailments, and prove that these
distances extend compositionally from words to phrases and sentences. We
exemplify our theoretical constructions on real data and a toy entailment
dataset and provide preliminary experimental evidence.Comment: 8 pages, 1 figure, 2 tables, short version presented in the
International Symposium on Artificial Intelligence and Mathematics (ISAIM),
201
- …