3 research outputs found
Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation
Hierarchical neural networks are often used to model inherent structures
within dialogues. For goal-oriented dialogues, these models miss a mechanism
adhering to the goals and neglect the distinct conversational patterns between
two interlocutors. In this work, we propose Goal-Embedded Dual Hierarchical
Attentional Encoder-Decoder (G-DuHA) able to center around goals and capture
interlocutor-level disparity while modeling goal-oriented dialogues.
Experiments on dialogue generation, response generation, and human evaluations
demonstrate that the proposed model successfully generates higher-quality, more
diverse and goal-centric dialogues. Moreover, we apply data augmentation via
goal-oriented dialogue generation for task-oriented dialog systems with better
performance achieved.Comment: Accepted by CoNLL-201
User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
One of the major impediments to the development of new task-oriented dialogue
(TOD) systems is the need for human evaluation at multiple stages and
iterations of the development process. In an effort to move toward automated
evaluation of TOD, we propose a novel user simulator built using recently
developed large pretrained language models (LLMs). In order to increase the
linguistic diversity of our system relative to the related previous work, we do
not fine-tune the LLMs used by our system on existing TOD datasets; rather we
use in-context learning to prompt the LLMs to generate robust and
linguistically diverse output with the goal of simulating the behavior of human
interlocutors. Unlike previous work, which sought to maximize goal success rate
(GSR) as the primary metric of simulator performance, our goal is a system
which achieves a GSR similar to that observed in human interactions with TOD
systems. Using this approach, our current simulator is effectively able to
interact with several TOD systems, especially on single-intent conversational
goals, while generating lexically and syntactically diverse output relative to
previous simulators that rely upon fine-tuned models. Finally, we collect a
Human2Bot dataset of humans interacting with the same TOD systems with which we
experimented in order to better quantify these achievements.Comment: 13 page