42,548 research outputs found
The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost
1 million multi-turn dialogues, with a total of over 7 million utterances and
100 million words. This provides a unique resource for research into building
dialogue managers based on neural language models that can make use of large
amounts of unlabeled data. The dataset has both the multi-turn property of
conversations in the Dialog State Tracking Challenge datasets, and the
unstructured nature of interactions from microblog services such as Twitter. We
also describe two neural learning architectures suitable for analyzing this
dataset, and provide benchmark performance on the task of selecting the best
next response.Comment: SIGDIAL 2015. 10 pages, 5 figures. Update includes link to new
version of the dataset, with some added features and bug fixes. See:
https://github.com/rkadlec/ubuntu-ranking-dataset-creato
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Sequential data often possesses a hierarchical structure with complex
dependencies between subsequences, such as found between the utterances in a
dialogue. In an effort to model this kind of generative process, we propose a
neural network-based generative architecture, with latent stochastic variables
that span a variable number of time steps. We apply the proposed model to the
task of dialogue response generation and compare it with recent neural network
architectures. We evaluate the model performance through automatic evaluation
metrics and by carrying out a human evaluation. The experiments demonstrate
that our model improves upon recently proposed models and that the latent
variables facilitate the generation of long outputs and maintain the context.Comment: 15 pages, 5 tables, 4 figure
- …