Search CORE

18 research outputs found

The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Author: Lowe Ryan
Pineau Joelle
Pow Nissan
Serban Iulian
Publication venue
Publication date: 01/01/2015
Field of study

This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter. We also describe two neural learning architectures suitable for analyzing this dataset, and provide benchmark performance on the task of selecting the best next response.Comment: SIGDIAL 2015. 10 pages, 5 figures. Update includes link to new version of the dataset, with some added features and bug fixes. See: https://github.com/rkadlec/ubuntu-ranking-dataset-creato

arXiv.org e-Print Archive

Crossref

Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus

Author: Charlin Laurent
Liu Chia-Wei
Lowe Ryan
Pineau Joelle
Pow Nissan
Serban Iulian Vlad
Publication venue: University of Illinois at Chicago Library
Publication date: 20/01/2017
Field of study

In this paper, we construct and train end-to-end neural network-based dialogue systems usingan updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines  in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance  conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu  Dialogue Corpus, and for end-to-end dialogue systems in general

University of Illinois at Chicago: Journals@UIC

Dialogue & Discourse (E-Journal - Universität Bielefeld)

Structured Attention for Unsupervised Dialogue Structure Induction

Author: Liang Yuan
Qiu Liang
Shi Feng
Shi Weiyan
Yu Zhou
Yuan Tao
Zhao Yizhou
Zhu Song-Chun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics. Advancement made in this area is critical for dialogue system design and discourse analysis. It can also be extended to solve grammatical inference. In this work, we propose to incorporate structured attention layers into a Variational Recurrent Neural Network (VRNN) model with discrete latent states to learn dialogue structure in an unsupervised fashion. Compared to a vanilla VRNN, structured attention enables a model to focus on different parts of the source sentence embeddings while enforcing a structural inductive bias. Experiments show that on two-party dialogue datasets, VRNN with structured attention learns semantic structures that are similar to templates used to generate this dialogue corpus. While on multi-party dialogue datasets, our model learns an interactive structure demonstrating its capability of distinguishing speakers or addresses, automatically disentangling dialogues without explicit human annotation.Comment: Long paper accepted by EMNLP 202

arXiv.org e-Print Archive

Crossref

Hello & Goodbye: Conversation Boundary Identification Using Text Classification

Author: Dunne Jonathan
Malone David
Publication venue
Publication date: 01/01/2018
Field of study

One of the main challenges in discourse analysis is the process of segmenting text into meaningful topic segments. While this problem has been studied over the past thirty years, previous topic segmentation studies ignore crucial elements of a conversation: an opening and closing remark. Our motivation to revisit this problem space is the rise of instant message usage. We consider the problem of topic segmentation as a machine learning classification one. Using both enterprise and open source datasets, we address the question as to whether a machine learning algorithm can be trained to identify salutations and valedictions within multi-party real-time chat conversations. Our results show that both Naive Bayes (NB) and Support Vector Machine (SVM) algorithms provide a reasonable degree of precision(mean F1 score: 0.58)

MURAL - Maynooth University Research Archive Library