195 research outputs found
Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking
The natural language generation (NLG) component of a spoken dialogue system
(SDS) usually needs a substantial amount of handcrafting or a well-labeled
dataset to be trained on. These limitations add significantly to development
costs and make cross-domain, multi-lingual dialogue systems intractable.
Moreover, human languages are context-aware. The most natural response should
be directly learned from data rather than depending on predefined syntaxes or
rules. This paper presents a statistical language generator based on a joint
recurrent and convolutional neural network structure which can be trained on
dialogue act-utterance pairs without any semantic alignments or predefined
grammar trees. Objective metrics suggest that this new model outperforms
previous methods under the same experimental conditions. Results of an
evaluation by human judges indicate that it produces not only high quality but
linguistically varied utterances which are preferred compared to n-gram and
rule-based systems.Comment: To be appear in SigDial 201
Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into
the so-called 'likelihood trap', resulting in generated responses which are
dull, repetitive, and often inconsistent with dialogue history. Comparing
ranked lists of multiple generated responses against the 'gold response' (from
training data) reveals a wide diversity in response quality, with many good
responses placed lower in the ranked list. The main challenge, addressed in
this work, is then how to reach beyond greedily generated system responses,
that is, how to obtain and select such high-quality responses from the list of
overgenerated responses at inference without availability of the gold response.
To this end, we propose a simple yet effective reranking method which aims to
select high-quality items from the lists of responses initially overgenerated
by the system. The idea is to use any sequence-level (similarity) scoring
function to divide the semantic space of responses into high-scoring versus
low-scoring partitions. At training, the high-scoring partition comprises all
generated responses whose similarity to the gold response is higher than the
similarity of the greedy response to the gold response. At inference, the aim
is to estimate the probability that each overgenerated response belongs to the
high-scoring partition, given only previous dialogue history. We validate the
robustness and versatility of our proposed method on the standard MultiWOZ
dataset: our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2
ROUGE, and 2.8 METEOR scores, achieving new peak results. Additional
experiments on the BiTOD dataset and human evaluation further ascertain the
generalisability and effectiveness of the proposed framework.Comment: 22 pages, 10 figure
- …