2,374 research outputs found
DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems
Dialogue act annotations are important to improve response generation quality
in task-oriented dialogue systems. However, it can be challenging to use
dialogue acts to control response generation in a generalizable way because
different datasets and tasks may have incompatible annotations. While
alternative methods that utilize latent action spaces or reinforcement learning
do not require explicit annotations, they may lack interpretability or face
difficulties defining task-specific rewards. In this work, we present a novel
end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts
in a latent space. DiactTOD, when pre-trained on a large corpus, is able to
predict and control dialogue acts to generate controllable responses using
these latent representations in a zero-shot fashion. Our approach demonstrates
state-of-the-art performance across a wide range of experimental settings on
the MultiWOZ dataset, including zero-shot, few-shot, and full data fine-tuning
with both end-to-end and policy optimization configurations.Comment: SIGDial 202
User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue
One of the major impediments to the development of new task-oriented dialogue
(TOD) systems is the need for human evaluation at multiple stages and
iterations of the development process. In an effort to move toward automated
evaluation of TOD, we propose a novel user simulator built using recently
developed large pretrained language models (LLMs). In order to increase the
linguistic diversity of our system relative to the related previous work, we do
not fine-tune the LLMs used by our system on existing TOD datasets; rather we
use in-context learning to prompt the LLMs to generate robust and
linguistically diverse output with the goal of simulating the behavior of human
interlocutors. Unlike previous work, which sought to maximize goal success rate
(GSR) as the primary metric of simulator performance, our goal is a system
which achieves a GSR similar to that observed in human interactions with TOD
systems. Using this approach, our current simulator is effectively able to
interact with several TOD systems, especially on single-intent conversational
goals, while generating lexically and syntactically diverse output relative to
previous simulators that rely upon fine-tuned models. Finally, we collect a
Human2Bot dataset of humans interacting with the same TOD systems with which we
experimented in order to better quantify these achievements.Comment: 13 page
Conversation Style Transfer using Few-Shot Learning
Conventional text style transfer approaches for natural language focus on
sentence-level style transfer without considering contextual information, and
the style is described with attributes (e.g., formality). When applying style
transfer on conversations such as task-oriented dialogues, existing approaches
suffer from these limitations as context can play an important role and the
style attributes are often difficult to define in conversations. In this paper,
we introduce conversation style transfer as a few-shot learning problem, where
the model learns to perform style transfer by observing only the target-style
dialogue examples. We propose a novel in-context learning approach to solve the
task with style-free dialogues as a pivot. Human evaluation shows that by
incorporating multi-turn context, the model is able to match the target style
while having better appropriateness and semantic correctness compared to
utterance-level style transfer. Additionally, we show that conversation style
transfer can also benefit downstream tasks. Results on multi-domain intent
classification tasks show improvement in F1 scores after transferring the style
of training data to match the style of test data
Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification
Intent classification (IC) plays an important role in task-oriented dialogue
systems as it identifies user intents from given utterances. However, models
trained on limited annotations for IC often suffer from a lack of
generalization to unseen intent classes. We propose a novel pre-training method
for text encoders that uses contrastive learning with intent psuedo-labels to
produce embeddings that are well-suited for IC tasks. By applying this
pre-training strategy, we also introduce the pre-trained intent-aware encoder
(PIE). Specifically, we first train a tagger to identify key phrases within
utterances that are crucial for interpreting intents. We then use these
extracted phrases to create examples for pre-training a text encoder in a
contrastive manner. As a result, our PIE model achieves up to 5.4% and 4.0%
higher accuracy than the previous state-of-the-art pre-trained sentence encoder
for the N-way zero- and one-shot settings on four IC datasets
- …