31 research outputs found
Automatic Accuracy Prediction for AMR Parsing
Abstract Meaning Representation (AMR) represents sentences as directed,
acyclic and rooted graphs, aiming at capturing their meaning in a machine
readable format. AMR parsing converts natural language sentences into such
graphs. However, evaluating a parser on new data by means of comparison to
manually created AMR graphs is very costly. Also, we would like to be able to
detect parses of questionable quality, or preferring results of alternative
systems by selecting the ones for which we can assess good quality. We propose
AMR accuracy prediction as the task of predicting several metrics of
correctness for an automatically generated AMR parse - in absence of the
corresponding gold parse. We develop a neural end-to-end multi-output
regression model and perform three case studies: firstly, we evaluate the
model's capacity of predicting AMR parse accuracies and test whether it can
reliably assign high scores to gold parses. Secondly, we perform parse
selection based on predicted parse accuracies of candidate parses from
alternative systems, with the aim of improving overall results. Finally, we
predict system ranks for submissions from two AMR shared tasks on the basis of
their predicted parse accuracy averages. All experiments are carried out across
two different domains and show that our method is effective.Comment: accepted at *SEM 201
Gzip versus bag-of-words for text classification with KNN
The effectiveness of compression distance in KNN-based text classification
('gzip') has recently garnered lots of attention. In this note we show that
simpler means can also be effective, and compression may not be needed. Indeed,
a 'bag-of-words' matching can achieve similar or better results, and is more
efficient.Comment: improved writin
Metrics of Graph-Based Meaning Representations with Applications from Parsing Evaluation to Explainable NLG Evaluation and Semantic Search
"Who does what to whom?" The goal of a graph-based meaning representation (in short: MR) is to represent the meaning of a text in a structured format. With an MR, we can explicate the meaning of a text, describe occurring events and entities, and their semantic relations. Thus, a metric of MRs would measure a distance (or similarity) between MRs. We believe that such a meaning-focused similarity measurement can be useful for several important AI tasks, for instance, testing the capability of systems to produce meaningful output (system evaluation), or when searching for similar texts (information retrieval). Moreover, due to the natural explicitness of MRs, we hypothesize that MR metrics could provide us with valuable explainability of their similarity measurement. Indeed, if texts reside in a space where their meaning has been isolated and structured, we might directly see in which aspects two texts are actually similar (or dissimilar).
However, we find that there is not much previous work on MR metrics, and thus we lack fundamental knowledge about them and their potential applications. Therefore, we make first steps to explore MR metrics and MR spaces, focusing on two key goals: 1. Develop novel and generally applicable methods for conducting similarity measurements in the space of MRs; 2. Explore potential applications that can profit from similarity assessments in MR spaces, including, but (by far) not limited to, their "classic" purpose of evaluating the quality of a text-to-MR system against a reference (aka parsing evaluation).
We start by analyzing contributions from previous works that have proposed MR metrics for parsing evaluation. Then, we move beyond this restricted setup and start to develop novel and more general MR metrics based on i) insights from our analysis of the previous parsing evaluation metrics and ii) our motivation to extend MR metrics to similarity assessment of natural language texts. To empirically evaluate and assess our generalized MR metrics, and to open the door for future improvements, we propose the first benchmark of MR metrics. With our benchmark, we can study MR metrics through the lens of multiple metric-objectives such as sentence similarity and robustness.
Then, we investigate novel applications of MR metrics. First, we explore new ways of applying MR metrics to evaluate systems that produce i) text from MRs (MR-to-text evaluation) and ii) MRs from text (MR parsing). We call our new setting MR projection-based, since we presume that one MR (at least) is unobserved and needs to be approximated. An advantage of such projection-based MR metric methods is that we can ablate a costly human reference. Notably, when visiting the MR-to-text scenario, we touch on a much broader application scenario for MR metrics: explainable MR-grounded evaluation of text generation systems.
Moving steadily towards the application of MR metrics to general text similarity, we study MR metrics for measuring the meaning similarity of natural language arguments, which is an important task in argument mining, a new and surging area of natural language processing (NLP). In particular, we show that MRs and MR metrics can support an explainable and unsupervised argument similarity analysis and inform us about the quality of argumentative conclusions.
Ultimately, we seek even more generality and are also interested in practical aspects such as efficiency. To this aim, we distill our insights from our hitherto explorations into MR metric spaces into an explainable state-of-the-art machine learning model for semantic search, a task for which we would like to achieve high accuracy and great efficiency. To this aim, we develop a controllable metric distillation approach that can explain how the similarity decisions in the neural text embedding space are modulated through interpretable features, while maintaining all efficiency and accuracy (sometimes improving it) of a high-performance neural semantic search method. This is an important contribution, since it shows i) that we can alleviate the efficiency bottleneck of computationally costly MR graph metrics and, vice versa, ii) that MR metrics can help mitigate a crucial limitation of large "black box" neural methods by eliciting explanations for decisions
A Mention-Ranking Model for Abstract Anaphora Resolution
Resolving abstract anaphora is an important, but difficult task for text
understanding. Yet, with recent advances in representation learning this task
becomes a more tangible aim. A central property of abstract anaphora is that it
establishes a relation between the anaphor embedded in the anaphoric sentence
and its (typically non-nominal) antecedent. We propose a mention-ranking model
that learns how abstract anaphors relate to their antecedents with an
LSTM-Siamese Net. We overcome the lack of training data by generating
artificial anaphoric sentence--antecedent pairs. Our model outperforms
state-of-the-art results on shell noun resolution. We also report first
benchmark results on an abstract anaphora subset of the ARRAU corpus. This
corpus presents a greater challenge due to a mixture of nominal and pronominal
anaphors and a greater range of confounders. We found model variants that
outperform the baselines for nominal anaphors, without training on individual
anaphor data, but still lag behind for pronominal anaphors. Our model selects
syntactically plausible candidates and -- if disregarding syntax --
discriminates candidates using deeper features.Comment: In Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing (EMNLP). Copenhagen, Denmar
Dissecting Content and Context in Argumentative Relation Analysis
When assessing relations between argumentative units (e.g., support or
attack), computational systems often exploit disclosing indicators or markers
that are not part of elementary argumentative units (EAUs) themselves, but are
gained from their context (position in paragraph, preceding tokens, etc.). We
show that this dependency is much stronger than previously assumed. In fact, we
show that by completely masking the EAU text spans and only feeding information
from their context, a competitive system may function even better. We argue
that an argument analysis system that relies more on discourse context than the
argument's content is unsafe, since it can easily be tricked. To alleviate this
issue, we separate argumentative units from their context such that the system
is forced to model and rely on an EAU's content. We show that the resulting
classification system is more robust, and argue that such models are better
suited for predicting argumentative relations across documents.Comment: accepted at 6th Workshop on Argument Minin
An Argument-Marker Model for Syntax-Agnostic Proto-Role Labeling
Semantic proto-role labeling (SPRL) is an alternative to semantic role
labeling (SRL) that moves beyond a categorical definition of roles, following
Dowty's feature-based view of proto-roles. This theory determines agenthood vs.
patienthood based on a participant's instantiation of more or less typical
agent vs. patient properties, such as, for example, volition in an event. To
perform SPRL, we develop an ensemble of hierarchical models with self-attention
and concurrently learned predicate-argument-markers. Our method is competitive
with the state-of-the art, overall outperforming previous work in two
formulations of the task (multi-label and multi-variate Likert scale
prediction). In contrast to previous work, our results do not depend on gold
argument heads derived from supplementary gold tree banks.Comment: accepted at *SEM 201
Final Design Report for Human Powered Vehicle Drivetrain Project
The Cal Poly Human Powered Vehicle Club is building a bike to surpass 61.3 mph in 2019. The club and their mentor, George Leone, have proposed a senior project to design, build, and test the drivetrain for this year’s human powered vehicle. Research into human powered vehicles and their drivetrains has shown that the power that a rider can output and the efficiency at which the rider can pedal depend extensively on the design of the drivetrain. Despite the existence of standard bicycle drivetrain designs, the senior project team has found that the best design to meet the club’s requirements is a completely custom drivetrain based on the rider’s dimensions and preferences. The team defined a list of technical specifications that they used to validate the completed final prototype. The final confirmation prototype functioned as intended and all of the specifications were met with the exception of total cost. Details of the team’s design, manufacturing, and testing processes are outlined in this document