1 research outputs found
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service
Human conversations are complicated and building a human-like dialogue agent
is an extremely challenging task. With the rapid development of deep learning
techniques, data-driven models become more and more prevalent which need a huge
amount of real conversation data. In this paper, we construct a large-scale
real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1
million multi-turn dialogues, 20 million utterances, and 150 million words. The
dataset reflects several characteristics of human-human conversations, e.g.,
goal-driven, and long-term dependency among the context. It also covers various
dialogue types including task-oriented, chitchat and question-answering. Extra
intent information and three well-annotated challenge sets are also provided.
Then, we evaluate several retrieval-based and generative models to provide
basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as
an effective testbed and benefit the development of fundamental research in
dialogue taskComment: This paper is accepted by LREC 2020 (International Conference on
Language Resources and Evaluation