2,629 research outputs found
An Efficient Implementation of the Head-Corner Parser
This paper describes an efficient and robust implementation of a
bi-directional, head-driven parser for constraint-based grammars. This parser
is developed for the OVIS system: a Dutch spoken dialogue system in which
information about public transport can be obtained by telephone.
After a review of the motivation for head-driven parsing strategies, and
head-corner parsing in particular, a non-deterministic version of the
head-corner parser is presented. A memoization technique is applied to obtain a
fast parser. A goal-weakening technique is introduced which greatly improves
average case efficiency, both in terms of speed and space requirements.
I argue in favor of such a memoization strategy with goal-weakening in
comparison with ordinary chart-parsers because such a strategy can be applied
selectively and therefore enormously reduces the space requirements of the
parser, while no practical loss in time-efficiency is observed. On the
contrary, experiments are described in which head-corner and left-corner
parsers implemented with selective memoization and goal weakening outperform
`standard' chart parsers. The experiments include the grammar of the OVIS
system and the Alvey NL Tools grammar.
Head-corner parsing is a mix of bottom-up and top-down processing. Certain
approaches towards robust parsing require purely bottom-up processing.
Therefore, it seems that head-corner parsing is unsuitable for such robust
parsing techniques. However, it is shown how underspecification (which arises
very naturally in a logic programming environment) can be used in the
head-corner parser to allow such robust parsing techniques. A particular robust
parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st
Neural Semantic Parsing by Character-based Translation: Experiments with Abstract Meaning Representations
We evaluate the character-level translation method for neural semantic
parsing on a large corpus of sentences annotated with Abstract Meaning
Representations (AMRs). Using a sequence-to-sequence model, and some trivial
preprocessing and postprocessing of AMRs, we obtain a baseline accuracy of 53.1
(F-score on AMR-triples). We examine five different approaches to improve this
baseline result: (i) reordering AMR branches to match the word order of the
input sentence increases performance to 58.3; (ii) adding part-of-speech tags
(automatically produced) to the input shows improvement as well (57.2); (iii)
So does the introduction of super characters (conflating frequent sequences of
characters to a single character), reaching 57.4; (iv) optimizing the training
process by using pre-training and averaging a set of models increases
performance to 58.7; (v) adding silver-standard training data obtained by an
off-the-shelf parser yields the biggest improvement, resulting in an F-score of
64.0. Combining all five techniques leads to an F-score of 71.0 on holdout
data, which is state-of-the-art in AMR parsing. This is remarkable because of
the relative simplicity of the approach.Comment: Camera ready for CLIN 2017 journa
Learning scale-variant and scale-invariant features for deep image classification
Convolutional Neural Networks (CNNs) require large image corpora to be
trained on classification tasks. The variation in image resolutions, sizes of
objects and patterns depicted, and image scales, hampers CNN training and
performance, because the task-relevant information varies over spatial scales.
Previous work attempting to deal with such scale variations focused on
encouraging scale-invariant CNN representations. However, scale-invariant
representations are incomplete representations of images, because images
contain scale-variant information as well. This paper addresses the combined
development of scale-invariant and scale-variant representations. We propose a
multi- scale CNN method to encourage the recognition of both types of features
and evaluate it on a challenging image classification task involving
task-relevant characteristics at multiple scales. The results show that our
multi-scale CNN outperforms single-scale CNN. This leads to the conclusion that
encouraging the combined development of a scale-invariant and scale-variant
representation in CNNs is beneficial to image recognition performance
Constraint-Based Categorial Grammar
We propose a generalization of Categorial Grammar in which lexical categories
are defined by means of recursive constraints. In particular, the introduction
of relational constraints allows one to capture the effects of (recursive)
lexical rules in a computationally attractive manner. We illustrate the
linguistic merits of the new approach by showing how it accounts for the syntax
of Dutch cross-serial dependencies and the position and scope of adjuncts in
such constructions. Delayed evaluation is used to process grammars containing
recursive constraints.Comment: 8 pages, LaTe
MoNoise: Modeling Noise Using a Modular Normalization System
We propose MoNoise: a normalization model focused on generalizability and
efficiency, it aims at being easily reusable and adaptable. Normalization is
the task of translating texts from a non- canonical domain to a more canonical
domain, in our case: from social media data to standard language. Our proposed
model is based on a modular candidate generation in which each module is
responsible for a different type of normalization action. The most important
generation modules are a spelling correction system and a word embeddings
module. Depending on the definition of the normalization task, a static lookup
list can be crucial for performance. We train a random forest classifier to
rank the candidates, which generalizes well to all different types of
normaliza- tion actions. Most features for the ranking originate from the
generation modules; besides these features, N-gram features prove to be an
important source of information. We show that MoNoise beats the
state-of-the-art on different normalization benchmarks for English and Dutch,
which all define the task of normalization slightly different.Comment: Source code: https://bitbucket.org/robvanderg/monois
Transducers from Rewrite Rules with Backreferences
Context sensitive rewrite rules have been widely used in several areas of
natural language processing, including syntax, morphology, phonology and speech
processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various
algorithms to compile such rewrite rules into finite-state transducers. The
present paper extends this work by allowing a limited form of backreferencing
in such rules. The explicit use of backreferencing leads to more elegant and
general solutions.Comment: 8 pages, EACL 1999 Berge
The Meaning Factory at SemEval-2017 Task 9: Producing AMRs with Neural Semantic Parsing
We evaluate a semantic parser based on a character-based sequence-to-sequence
model in the context of the SemEval-2017 shared task on semantic parsing for
AMRs. With data augmentation, super characters, and POS-tagging we gain major
improvements in performance compared to a baseline character-level model.
Although we improve on previous character-based neural semantic parsing models,
the overall accuracy is still lower than a state-of-the-art AMR parser. An
ensemble combining our neural semantic parser with an existing, traditional
parser, yields a small gain in performance.Comment: To appear in Proceedings of SemEval, 2017 (camera-ready
UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish
The present study describes our submission to SemEval 2018 Task 1: Affect in
Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to
automatically generate additional training data by (i) translating training
data from other languages and (ii) applying a semi-supervised learning method.
We find strong support for both approaches, with those models outperforming our
regular models in all subtasks. However, creating a stepwise ensemble of
different models as opposed to simply averaging did not result in an increase
in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and
fifth (V-Oc) in the four Spanish subtasks we participated in.Comment: Accepted at SemEval 201
- …