2,860 research outputs found
Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
Human-like chatbots necessitate the use of commonsense reasoning in order to
effectively comprehend and respond to implicit information present within
conversations. Achieving such coherence and informativeness in responses,
however, is a non-trivial task. Even for large language models (LLMs), the task
of identifying and aggregating key evidence within a single hop presents a
substantial challenge. This complexity arises because such evidence is
scattered across multiple turns in a conversation, thus necessitating
integration over multiple hops. Hence, our focus is to facilitate such
multi-hop reasoning over a dialogue context, namely dialogue chain-of-thought
(CoT) reasoning. To this end, we propose a knowledge distillation framework
that leverages LLMs as unreliable teachers and selectively distills consistent
and helpful rationales via alignment filters. We further present DOCTOR, a
DialOgue Chain-of-ThOught Reasoner that provides reliable CoT rationales for
response generation. We conduct extensive experiments to show that enhancing
dialogue agents with high-quality rationales from DOCTOR significantly improves
the quality of their responses.Comment: 25 pages, 8 figures, Accepted to EMNLP 202
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
Visual information is central to conversation: body gestures and physical
behaviour, for example, contribute to meaning that transcends words alone. To
date, however, most neural conversational models are limited to just text. We
introduce CHAMPAGNE, a generative model of conversations that can account for
visual contexts. To train CHAMPAGNE, we collect and release YTD-18M, a
large-scale corpus of 18M video-based dialogues. YTD-18M is constructed from
web videos: crucial to our data collection pipeline is a pretrained language
model that converts error-prone automatic transcripts to a cleaner dialogue
format while maintaining meaning. Human evaluation reveals that YTD-18M is more
sensible and specific than prior resources (MMDialog, 1M dialogues), while
maintaining visual-groundedness. Experiments demonstrate that 1) CHAMPAGNE
learns to conduct conversation from YTD-18M; and 2) when fine-tuned, it
achieves state-of-the-art results on four vision-language tasks focused on
real-world conversations. We release data, models, and code.Comment: ICCV 2023, Project page: https://seungjuhan.me/champagn
Syntactic manipulation for generating more diverse and interesting texts
Natural Language Generation plays an important role in the domain of dialogue systems as it determines how users perceive the system. Recently, deep-learning based systems have been proposed to tackle this task, as they generalize better and require less amounts of manual effort to implement them for new domains. However, deep learning systems usually adapt a very homogeneous sounding writing style which expresses little variation. In this work, we present our system for Natural Language Generation where we control various aspects of the surface realization in order to increase the lexical variability of the utterances, such that they sound more diverse and interesting. For this, we use a Semantically Controlled Long Short-term Memory Network (SCLSTM), and apply its specialized cell to control various syntactic features of the generated texts. We present an in-depth human evaluation where we show the effects of these surface manipulation on the perception of potential users
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Reinforcement learning from human feedback (RLHF) is effective at aligning
large language models (LLMs) to human preferences, but gathering high quality
human preference labels is a key bottleneck. We conduct a head-to-head
comparison of RLHF vs. RL from AI Feedback (RLAIF) - a technique where
preferences are labeled by an off-the-shelf LLM in lieu of humans, and we find
that they result in similar improvements. On the task of summarization, human
evaluators prefer generations from both RLAIF and RLHF over a baseline
supervised fine-tuned model in ~70% of cases. Furthermore, when asked to rate
RLAIF vs. RLHF summaries, humans prefer both at equal rates. These results
suggest that RLAIF can yield human-level performance, offering a potential
solution to the scalability limitations of RLHF
- …