506 research outputs found
Automatic Accuracy Prediction for AMR Parsing
Abstract Meaning Representation (AMR) represents sentences as directed,
acyclic and rooted graphs, aiming at capturing their meaning in a machine
readable format. AMR parsing converts natural language sentences into such
graphs. However, evaluating a parser on new data by means of comparison to
manually created AMR graphs is very costly. Also, we would like to be able to
detect parses of questionable quality, or preferring results of alternative
systems by selecting the ones for which we can assess good quality. We propose
AMR accuracy prediction as the task of predicting several metrics of
correctness for an automatically generated AMR parse - in absence of the
corresponding gold parse. We develop a neural end-to-end multi-output
regression model and perform three case studies: firstly, we evaluate the
model's capacity of predicting AMR parse accuracies and test whether it can
reliably assign high scores to gold parses. Secondly, we perform parse
selection based on predicted parse accuracies of candidate parses from
alternative systems, with the aim of improving overall results. Finally, we
predict system ranks for submissions from two AMR shared tasks on the basis of
their predicted parse accuracy averages. All experiments are carried out across
two different domains and show that our method is effective.Comment: accepted at *SEM 201
Story Cloze Ending Selection Baselines and Data Examination
This paper describes two supervised baseline systems for the Story Cloze Test
Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using
features based on word embeddings and semantic similarity computation. We
further implement a neural LSTM system with different encoding strategies that
try to model the relation between the story and the provided endings. Our
experiments show that a model using representation features based on average
word embedding vectors over the given story words and the candidate ending
sentences words, joint with similarity features between the story and candidate
ending representations performed better than the neural models. Our best model
achieves an accuracy of 72.42, ranking 3rd in the official evaluation.Comment: Submission for the LSDSem 2017 - Linking Models of Lexical,
Sentential and Discourse-level Semantics - Shared Tas
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
To make machines better understand sentiments, research needs to move from
polarity identification to understanding the reasons that underlie the
expression of sentiment. Categorizing the goals or needs of humans is one way
to explain the expression of sentiment in text. Humans are good at
understanding situations described in natural language and can easily connect
them to the character's psychological needs using commonsense knowledge. We
present a novel method to extract, rank, filter and select multi-hop relation
paths from a commonsense knowledge resource to interpret the expression of
sentiment in terms of their underlying human needs. We efficiently integrate
the acquired knowledge paths in a neural model that interfaces context
representations with knowledge using a gated attention mechanism. We assess the
model's performance on a recently published dataset for categorizing human
needs. Selectively integrating knowledge paths boosts performance and
establishes a new state-of-the-art. Our model offers interpretability through
the learned attention map over commonsense knowledge paths. Human evaluation
highlights the relevance of the encoded knowledge
SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling
For over a decade, machine learning has been used to extract
opinion-holder-target structures from text to answer the question "Who
expressed what kind of sentiment towards what?". Recent neural approaches do
not outperform the state-of-the-art feature-based models for Opinion Role
Labeling (ORL). We suspect this is due to the scarcity of labeled training data
and address this issue using different multi-task learning (MTL) techniques
with a related task which has substantially more data, i.e. Semantic Role
Labeling (SRL). We show that two MTL models improve significantly over the
single-task model for labeling of both holders and targets, on the development
and the test sets. We found that the vanilla MTL model which makes predictions
using only shared ORL and SRL features, performs the best. With deeper analysis
we determine what works and what might be done to make further improvements for
ORL.Comment: Published in NAACL 201
Neural Skill Transfer from Supervised Language Tasks to Reading Comprehension
Reading comprehension is a challenging task in natural language processing
and requires a set of skills to be solved. While current approaches focus on
solving the task as a whole, in this paper, we propose to use a neural network
`skill' transfer approach. We transfer knowledge from several lower-level
language tasks (skills) including textual entailment, named entity recognition,
paraphrase detection and question type classification into the reading
comprehension model.
We conduct an empirical evaluation and show that transferring language skill
knowledge leads to significant improvements for the task with much fewer steps
compared to the baseline model. We also show that the skill transfer approach
is effective even with small amounts of training data. Another finding of this
work is that using token-wise deep label supervision for text classification
improves the performance of transfer learning
A Mention-Ranking Model for Abstract Anaphora Resolution
Resolving abstract anaphora is an important, but difficult task for text
understanding. Yet, with recent advances in representation learning this task
becomes a more tangible aim. A central property of abstract anaphora is that it
establishes a relation between the anaphor embedded in the anaphoric sentence
and its (typically non-nominal) antecedent. We propose a mention-ranking model
that learns how abstract anaphors relate to their antecedents with an
LSTM-Siamese Net. We overcome the lack of training data by generating
artificial anaphoric sentence--antecedent pairs. Our model outperforms
state-of-the-art results on shell noun resolution. We also report first
benchmark results on an abstract anaphora subset of the ARRAU corpus. This
corpus presents a greater challenge due to a mixture of nominal and pronominal
anaphors and a greater range of confounders. We found model variants that
outperform the baselines for nominal anaphors, without training on individual
anaphor data, but still lag behind for pronominal anaphors. Our model selects
syntactically plausible candidates and -- if disregarding syntax --
discriminates candidates using deeper features.Comment: In Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing (EMNLP). Copenhagen, Denmar
Dynamic MOdularized Reasoning for Compositional Structured Explanation Generation
Despite the success of neural models in solving reasoning tasks, their
compositional generalization capabilities remain unclear. In this work, we
propose a new setting of the structured explanation generation task to
facilitate compositional reasoning research. Previous works found that symbolic
methods achieve superior compositionality by using pre-defined inference rules
for iterative reasoning. But these approaches rely on brittle symbolic
transfers and are restricted to well-defined tasks. Hence, we propose a dynamic
modularized reasoning model, MORSE, to improve the compositional generalization
of neural models. MORSE factorizes the inference process into a combination of
modules, where each module represents a functional unit. Specifically, we adopt
modularized self-attention to dynamically select and route inputs to dedicated
heads, which specializes them to specific functions. We conduct experiments for
increasing lengths and shapes of reasoning trees on two benchmarks to test
MORSE's compositional generalization abilities, and find it outperforms
competitive baselines. Model ablation and deeper analyses show the
effectiveness of dynamic reasoning modules and their generalization abilities
- …