141 research outputs found
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
Existing dialogue models may encounter scenarios which are not
well-represented in the training data, and as a result generate responses that
are unnatural, inappropriate, or unhelpful. We propose the "Ask an Expert"
framework in which the model is trained with access to an "expert" which it can
consult at each turn. Advice is solicited via a structured dialogue with the
expert, and the model is optimized to selectively utilize (or ignore) it given
the context and dialogue history. In this work the expert takes the form of an
LLM. We evaluate this framework in a mental health support domain, where the
structure of the expert conversation is outlined by pre-specified prompts which
reflect a reasoning strategy taught to practitioners in the field. Blenderbot
models utilizing "Ask an Expert" show quality improvements across all expert
sizes, including those with fewer parameters than the dialogue model itself.
Our best model provides a improvement over baselines, approaching
human-level scores on "engingingness" and "helpfulness" metrics.Comment: Accepted in Findings of the Association for Computational
Linguistics: ACL 202
Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation
Knowing how to end and resume conversations over time is a natural part of
communication, allowing for discussions to span weeks, months, or years. The
duration of gaps between conversations dictates which topics are relevant and
which questions to ask, and dialogue systems which do not explicitly model time
may generate responses that are unnatural. In this work we explore the idea of
making dialogue models aware of time, and present GapChat, a multi-session
dialogue dataset in which the time between each session varies. While the
dataset is constructed in real-time, progress on events in speakers' lives is
simulated in order to create realistic dialogues occurring across a long
timespan. We expose time information to the model and compare different
representations of time and event progress. In human evaluation we show that
time-aware models perform better in metrics that judge the relevance of the
chosen topics and the information gained from the conversation.Comment: Accepted in the Findings of EMNLP 202
Evaluating contributions of natural language parsers to proteināprotein interaction extraction
Motivation: While text mining technologies for biomedical research have gained popularity as a way to take advantage of the explosive growth of information in text form in biomedical papers, selecting appropriate natural language processing (NLP) tools is still difficult for researchers who are not familiar with recent advances in NLP. This article provides a comparative evaluation of several state-of-the-art natural language parsers, focusing on the task of extracting proteināprotein interaction (PPI) from biomedical papers. We measure how each parser, and its output representation, contributes to accuracy improvement when the parser is used as a component in a PPI system
Does My Rebuttal Matter? Insights from a Major NLP Conference
Peer review is a core element of the scientific process, particularly in
conference-centered fields such as ML and NLP. However, only few studies have
evaluated its properties empirically. Aiming to fill this gap, we present a
corpus that contains over 4k reviews and 1.2k author responses from ACL-2018.
We quantitatively and qualitatively assess the corpus. This includes a pilot
study on paper weaknesses given by reviewers and on quality of author
responses. We then focus on the role of the rebuttal phase, and propose a novel
task to predict after-rebuttal (i.e., final) scores from initial reviews and
author responses. Although author responses do have a marginal (and
statistically significant) influence on the final scores, especially for
borderline papers, our results suggest that a reviewer's final score is largely
determined by her initial score and the distance to the other reviewers'
initial scores. In this context, we discuss the conformity bias inherent to
peer reviewing, a bias that has largely been overlooked in previous research.
We hope our analyses will help better assess the usefulness of the rebuttal
phase in NLP conferences.Comment: Accepted to NAACL-HLT 2019. Main paper plus supplementary materia
- ā¦