69 research outputs found
Domain Agnostic Real-Valued Specificity Prediction
Sentence specificity quantifies the level of detail in a sentence,
characterizing the organization of information in discourse. While this
information is useful for many downstream applications, specificity prediction
systems predict very coarse labels (binary or ternary) and are trained on and
tailored toward specific domains (e.g., news). The goal of this work is to
generalize specificity prediction to domains where no labeled data is available
and output more nuanced real-valued specificity ratings.
We present an unsupervised domain adaptation system for sentence specificity
prediction, specifically designed to output real-valued estimates from binary
training labels. To calibrate the values of these predictions appropriately, we
regularize the posterior distribution of the labels towards a reference
distribution. We show that our framework generalizes well to three different
domains with 50%~68% mean absolute error reduction than the current
state-of-the-art system trained for news sentence specificity. We also
demonstrate the potential of our work in improving the quality and
informativeness of dialogue generation systems.Comment: AAAI 2019 camera read
Sarcasm Detection in a Disaster Context
During natural disasters, people often use social media platforms such as
Twitter to ask for help, to provide information about the disaster situation,
or to express contempt about the unfolding event or public policies and
guidelines. This contempt is in some cases expressed as sarcasm or irony.
Understanding this form of speech in a disaster-centric context is essential to
improving natural language understanding of disaster-related tweets. In this
paper, we introduce HurricaneSARC, a dataset of 15,000 tweets annotated for
intended sarcasm, and provide a comprehensive investigation of sarcasm
detection using pre-trained language models. Our best model is able to obtain
as much as 0.70 F1 on our dataset. We also demonstrate that the performance on
HurricaneSARC can be improved by leveraging intermediate task transfer
learning. We release our data and code at
https://github.com/tsosea2/HurricaneSarc
SNaC: Coherence Error Detection for Narrative Summarization
Progress in summarizing long texts is inhibited by the lack of appropriate
evaluation frameworks. When a long summary must be produced to appropriately
cover the facets of that text, that summary needs to present a coherent
narrative to be understandable by a reader, but current automatic and human
evaluation methods fail to identify gaps in coherence. In this work, we
introduce SNaC, a narrative coherence evaluation framework rooted in
fine-grained annotations for long summaries. We develop a taxonomy of coherence
errors in generated narrative summaries and collect span-level annotations for
6.6k sentences across 150 book and movie screenplay summaries. Our work
provides the first characterization of coherence errors generated by
state-of-the-art summarization models and a protocol for eliciting coherence
judgments from crowd annotators. Furthermore, we show that the collected
annotations allow us to train a strong classifier for automatically localizing
coherence errors in generated summaries as well as benchmarking past work in
coherence modeling. Finally, our SNaC framework can support future work in long
document summarization and coherence evaluation, including improved
summarization modeling and post-hoc summary correction.Comment: EMNLP 202
Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models
The emotions we experience involve complex processes; besides physiological
aspects, research in psychology has studied cognitive appraisals where people
assess their situations subjectively, according to their own values (Scherer,
2005). Thus, the same situation can often result in different emotional
experiences. While the detection of emotion is a well-established task, there
is very limited work so far on the automatic prediction of cognitive
appraisals. This work fills the gap by presenting CovidET-Appraisals, the most
comprehensive dataset to-date that assesses 24 appraisal dimensions, each with
a natural language rationale, across 241 Reddit posts. CovidET-Appraisals
presents an ideal testbed to evaluate the ability of large language models --
excelling at a wide range of NLP tasks -- to automatically assess and explain
cognitive appraisals. We found that while the best models are performant,
open-sourced LLMs fall short at this task, presenting a new challenge in the
future development of emotionally intelligent models. We release our dataset at
https://github.com/honglizhan/CovidET-Appraisals-Public.Comment: EMNLP 2023 (Findings) Camera-Ready Versio
- …
