38 research outputs found
The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
The Parallel Meaning Bank is a corpus of translations annotated with shared,
formal meaning representations comprising over 11 million words divided over
four languages (English, German, Italian, and Dutch). Our approach is based on
cross-lingual projection: automatically produced (and manually corrected)
semantic annotations for English sentences are mapped onto their word-aligned
translations, assuming that the translations are meaning-preserving. The
semantic annotation consists of five main steps: (i) segmentation of the text
in sentences and lexical items; (ii) syntactic parsing with Combinatory
Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and
(v) compositional semantic analysis based on Discourse Representation Theory.
These steps are performed using statistical models trained in a semi-supervised
manner. The employed annotation models are all language-neutral. Our first
results are promising.Comment: To appear at EACL 201
Neural Semantic Parsing by Character-based Translation: Experiments with Abstract Meaning Representations
We evaluate the character-level translation method for neural semantic
parsing on a large corpus of sentences annotated with Abstract Meaning
Representations (AMRs). Using a sequence-to-sequence model, and some trivial
preprocessing and postprocessing of AMRs, we obtain a baseline accuracy of 53.1
(F-score on AMR-triples). We examine five different approaches to improve this
baseline result: (i) reordering AMR branches to match the word order of the
input sentence increases performance to 58.3; (ii) adding part-of-speech tags
(automatically produced) to the input shows improvement as well (57.2); (iii)
So does the introduction of super characters (conflating frequent sequences of
characters to a single character), reaching 57.4; (iv) optimizing the training
process by using pre-training and averaging a set of models increases
performance to 58.7; (v) adding silver-standard training data obtained by an
off-the-shelf parser yields the biggest improvement, resulting in an F-score of
64.0. Combining all five techniques leads to an F-score of 71.0 on holdout
data, which is state-of-the-art in AMR parsing. This is remarkable because of
the relative simplicity of the approach.Comment: Camera ready for CLIN 2017 journa
Higher-Order DisCoCat (Peirce-Lambek-Montague semantics)
We propose a new definition of higher-order DisCoCat (categorical
compositional distributional) models where the meaning of a word is not a
diagram, but a diagram-valued higher-order function. Our models can be seen as
a variant of Montague semantics based on a lambda calculus where the primitives
act on string diagrams rather than logical formulae. As a special case, we show
how to translate from the Lambek calculus into Peirce's system beta for
first-order logic. This allows us to give a purely diagrammatic treatment of
higher-order and non-linear processes in natural language semantics: adverbs,
prepositions, negation and quantifiers. The theoretical definition presented in
this article comes with a proof-of-concept implementation in DisCoPy, the
Python library for string diagrams.Comment: 19 pages, 11 figure
Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference
In logic-based approaches to reasoning tasks such as Recognizing Textual
Entailment (RTE), it is important for a system to have a large amount of
knowledge data. However, there is a tradeoff between adding more knowledge data
for improved RTE performance and maintaining an efficient RTE system, as such a
big database is problematic in terms of the memory usage and computational
complexity. In this work, we show the processing time of a state-of-the-art
logic-based RTE system can be significantly reduced by replacing its
search-based axiom injection (abduction) mechanism by that based on Knowledge
Base Completion (KBC). We integrate this mechanism in a Coq plugin that
provides a proof automation tactic for natural language inference.
Additionally, we show empirically that adding new knowledge data contributes to
better RTE performance while not harming the processing speed in this
framework.Comment: 9 pages, accepted to AAAI 201