6,370 research outputs found
Recognizing cited facts and principles in legal judgements
In common law jurisdictions, legal professionals cite facts and legal principles from precedent cases to support their arguments before the court for their intended outcome in a current case. This practice stems from the doctrine of stare decisis, where cases that have similar facts should receive similar decisions with respect to the principles. It is essential for legal professionals to identify such facts and principles in precedent cases, though this is a highly time intensive task. In this paper, we present studies that demonstrate that human annotators can achieve reasonable agreement on which sentences in legal judgements contain cited facts and principles (respectively, κ=0.65 and κ=0.95 for inter- and intra-annotator agreement). We further demonstrate that it is feasible to automatically annotate sentences containing such legal facts and principles in a supervised machine learning framework based on linguistic features, reporting per category precision and recall figures of between 0.79 and 0.89 for classifying sentences in legal judgements as cited facts, principles or neither using a Bayesian classifier, with an overall κ of 0.72 with the human-annotated gold standard
Modeling Empathy and Distress in Reaction to News Stories
Computational detection and understanding of empathy is an important factor
in advancing human-computer interaction. Yet to date, text-based empathy
prediction has the following major limitations: It underestimates the
psychological complexity of the phenomenon, adheres to a weak notion of ground
truth where empathic states are ascribed by third parties, and lacks a shared
corpus. In contrast, this contribution presents the first publicly available
gold standard for empathy prediction. It is constructed using a novel
annotation methodology which reliably captures empathy assessments by the
writer of a statement using multi-item scales. This is also the first
computational work distinguishing between multiple forms of empathy, empathic
concern, and personal distress, as recognized throughout psychology. Finally,
we present experimental results for three different predictive models, of which
a CNN performs the best.Comment: To appear at EMNLP 201
"i have a feeling trump will win..................": Forecasting Winners and Losers from User Predictions on Twitter
Social media users often make explicit predictions about upcoming events.
Such statements vary in the degree of certainty the author expresses toward the
outcome:"Leonardo DiCaprio will win Best Actor" vs. "Leonardo DiCaprio may win"
or "No way Leonardo wins!". Can popular beliefs on social media predict who
will win? To answer this question, we build a corpus of tweets annotated for
veridicality on which we train a log-linear classifier that detects positive
veridicality with high precision. We then forecast uncertain outcomes using the
wisdom of crowds, by aggregating users' explicit predictions. Our method for
forecasting winners is fully automated, relying only on a set of contenders as
input. It requires no training data of past outcomes and outperforms sentiment
and tweet volume baselines on a broad range of contest prediction tasks. We
further demonstrate how our approach can be used to measure the reliability of
individual accounts' predictions and retrospectively identify surprise
outcomes.Comment: Accepted at EMNLP 2017 (long paper
Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation
Rating scales are a widely used method for data annotation; however, they
present several challenges, such as difficulty in maintaining inter- and
intra-annotator consistency. Best-worst scaling (BWS) is an alternative method
of annotation that is claimed to produce high-quality annotations while keeping
the required number of annotations similar to that of rating scales. However,
the veracity of this claim has never been systematically established. Here for
the first time, we set up an experiment that directly compares the rating scale
method with BWS. We show that with the same total number of annotations, BWS
produces significantly more reliable results than the rating scale.Comment: In Proceedings of the Annual Meeting of the Association for
Computational Linguistics (ACL), Vancouver, Canada, 201
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
We survey 146 papers analyzing "bias" in NLP systems, finding that their
motivations are often vague, inconsistent, and lacking in normative reasoning,
despite the fact that analyzing "bias" is an inherently normative process. We
further find that these papers' proposed quantitative techniques for measuring
or mitigating "bias" are poorly matched to their motivations and do not engage
with the relevant literature outside of NLP. Based on these findings, we
describe the beginnings of a path forward by proposing three recommendations
that should guide work analyzing "bias" in NLP systems. These recommendations
rest on a greater recognition of the relationships between language and social
hierarchies, encouraging researchers and practitioners to articulate their
conceptualizations of "bias"---i.e., what kinds of system behaviors are
harmful, in what ways, to whom, and why, as well as the normative reasoning
underlying these statements---and to center work around the lived experiences
of members of communities affected by NLP systems, while interrogating and
reimagining the power relations between technologists and such communities
- …