3 research outputs found
A Compositional Vector Space Model of Ellipsis and Anaphora.
PhD ThesisThis thesis discusses research in compositional distributional semantics: if words
are defined by their use in language and represented as high-dimensional vectors
reflecting their co-occurrence behaviour in textual corpora, how should words be
composed to produce a similar numerical representation for sentences, paragraphs
and documents? Neural methods learn a task-dependent composition by generalising
over large datasets, whereas type-driven approaches stipulate that composition
is given by a functional view on words, leaving open the question of what those
functions should do, concretely.
We take on the type-driven approach to compositional distributional semantics
and focus on the categorical framework of Coecke, Grefenstette, and Sadrzadeh
[CGS13], which models composition as an interpretation of syntactic structures as
linear maps on vector spaces using the language of category theory, as well as the
two-step approach of Muskens and Sadrzadeh [MS16], where syntactic structures
map to lambda logical forms that are instantiated by a concrete composition model.
We develop the theory behind these approaches to cover phenomena not dealt with
in previous work, evaluate the models in sentence-level tasks, and implement a tensor
learning method that generalises to arbitrary sentences.
This thesis reports three main contributions. The first, theoretical in nature, discusses
the ability of categorical and lambda-based models of compositional distributional
semantics to model ellipsis, anaphora, and parasitic gaps; phenomena that
challenge the linearity of previous compositional models. Secondly, we perform an
evaluation study on verb phrase ellipsis where we introduce three novel sentence
evaluation datasets and compare algebraic, neural, and tensor-based composition
models to show that models that resolve ellipsis achieve higher correlation with humans.
Finally, we generalise the skipgram model [Mik+13] to a tensor-based setting
and implement it for transitive verbs, showing that neural methods to learn tensor
representations for words can outperform previous tensor-based methods on compositional
tasks
Chart Parsing Multimodal Grammars
The short note describes the chart parser for multimodal type-logical grammars which has been developed in conjunction with the type-logical treebank for French. The chart parser presents an incomplete but fast implementation of proof search for multimodal type-logical grammars using the "deductive parsing" framework. Proofs found can be transformed to natural deduction proofs