94 research outputs found
Retrieval-based Goal-Oriented Dialogue Generation
Most research on dialogue has focused either on dialogue generation for
openended chit chat or on state tracking for goal-directed dialogue. In this
work, we explore a hybrid approach to goal-oriented dialogue generation that
combines retrieval from past history with a hierarchical, neural
encoder-decoder architecture. We evaluate this approach in the customer support
domain using the Multiwoz dataset (Budzianowski et al., 2018). We show that
adding this retrieval step to a hierarchical, neural encoder-decoder
architecture leads to significant improvements, including responses that are
rated more appropriate and fluent by human evaluators. Finally, we compare our
retrieval-based model to various semantically conditioned models explicitly
using past dialog act information, and find that our proposed model is
competitive with the current state of the art (Chen et al., 2019), while not
requiring explicit labels about past machine acts
Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context
Conversations have an intrinsic one-to-many property, which means that
multiple responses can be appropriate for the same dialog context. In
task-oriented dialogs, this property leads to different valid dialog policies
towards task completion. However, none of the existing task-oriented dialog
generation approaches takes this property into account. We propose a
Multi-Action Data Augmentation (MADA) framework to utilize the one-to-many
property to generate diverse appropriate dialog responses. Specifically, we
first use dialog states to summarize the dialog history, and then discover all
possible mappings from every dialog state to its different valid system
actions. During dialog system training, we enable the current dialog state to
map to all valid system actions discovered in the previous process to create
additional state-action pairs. By incorporating these additional pairs, the
dialog policy learns a balanced action distribution, which further guides the
dialog model to generate diverse responses. Experimental results show that the
proposed framework consistently improves dialog policy diversity, and results
in improved response diversity and appropriateness. Our model obtains
state-of-the-art results on MultiWOZ
MALA: Cross-Domain Dialogue Generation with Action Learning
Response generation for task-oriented dialogues involves two basic
components: dialogue planning and surface realization. These two components,
however, have a discrepancy in their objectives, i.e., task completion and
language quality. To deal with such discrepancy, conditioned response
generation has been introduced where the generation process is factorized into
action decision and language generation via explicit action representations. To
obtain action representations, recent studies learn latent actions in an
unsupervised manner based on the utterance lexical similarity. Such an action
learning approach is prone to diversities of language surfaces, which may
impinge task completion and language quality. To address this issue, we propose
multi-stage adaptive latent action learning (MALA) that learns semantic latent
actions by distinguishing the effects of utterances on dialogue progress. We
model the utterance effect using the transition of dialogue states caused by
the utterance and develop a semantic similarity measurement that estimates
whether utterances have similar effects. For learning semantic actions on
domains without dialogue states, MsALA extends the semantic similarity
measurement across domains progressively, i.e., from aligning shared actions to
learning domain-specific actions. Experiments using multi-domain datasets, SMD
and MultiWOZ, show that our proposed model achieves consistent improvements
over the baselines models in terms of both task completion and language
quality.Comment: 9 pages, 3 figure
Causal-aware Safe Policy Improvement for Task-oriented dialogue
The recent success of reinforcement learning's (RL) in solving complex tasks
is most often attributed to its capacity to explore and exploit an environment
where it has been trained. Sample efficiency is usually not an issue since
cheap simulators are available to sample data on-policy. On the other hand,
task oriented dialogues are usually learnt from offline data collected using
human demonstrations. Collecting diverse demonstrations and annotating them is
expensive. Unfortunately, use of RL methods trained on off-policy data are
prone to issues of bias and generalization, which are further exacerbated by
stochasticity in human response and non-markovian belief state of a dialogue
management system. To this end, we propose a batch RL framework for task
oriented dialogue policy learning: causal aware safe policy improvement
(CASPI). This method gives guarantees on dialogue policy's performance and also
learns to shape rewards according to intentions behind human responses, rather
than just mimicking demonstration data; this couple with batch-RL helps overall
with sample efficiency of the framework. We demonstrate the effectiveness of
this framework on a dialogue-context-to-text Generation and end-to-end dialogue
task of the Multiwoz2.0 dataset. The proposed method outperforms the current
state of the art on these metrics, in both case. In the end-to-end case, our
method trained only on 10\% of the data was able to out perform current state
in three out of four evaluation metrics
- …