228 research outputs found
SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures
Current open-domain conversational models can easily be made to talk in
inadequate ways. Online learning from conversational feedback given by the
conversation partner is a promising avenue for a model to improve and adapt, so
as to generate fewer of these safety failures. However, current
state-of-the-art models tend to react to feedback with defensive or oblivious
responses. This makes for an unpleasant experience and may discourage
conversation partners from giving feedback in the future. This work proposes
SaFeRDialogues, a task and dataset of graceful responses to conversational
feedback about safety failures. We collect a dataset of 10k dialogues
demonstrating safety failures, feedback signaling them, and a response
acknowledging the feedback. We show how fine-tuning on this dataset results in
conversations that human raters deem considerably more likely to lead to a
civil conversation, without sacrificing engagingness or general conversational
ability.Comment: Accepted at ACL 202
Detecting Inspiring Content on Social Media
Inspiration moves a person to see new possibilities and transforms the way
they perceive their own potential. Inspiration has received little attention in
psychology, and has not been researched before in the NLP community. To the
best of our knowledge, this work is the first to study inspiration through
machine learning methods. We aim to automatically detect inspiring content from
social media data. To this end, we analyze social media posts to tease out what
makes a post inspiring and what topics are inspiring. We release a dataset of
5,800 inspiring and 5,800 non-inspiring English-language public post unique ids
collected from a dump of Reddit public posts made available by a third party
and use linguistic heuristics to automatically detect which social media
English-language posts are inspiring.Comment: accepted at ACII 202
Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness
Open-domain dialogue agents have vastly improved, but still confidently
hallucinate knowledge or express doubt when asked straightforward questions. In
this work, we analyze whether state-of-the-art chit-chat models can express
metacognition capabilities through their responses: does a verbalized
expression of doubt (or confidence) match the likelihood that the model's
answer is incorrect (or correct)? We find that these models are poorly
calibrated in this sense, yet we show that the representations within the
models can be used to accurately predict likelihood of correctness. By
incorporating these correctness predictions into the training of a controllable
generation model, we obtain a dialogue agent with greatly improved linguistic
calibration
- …