18,090 research outputs found
A Survey of Document Grounded Dialogue Systems (DGDS)
Dialogue system (DS) attracts great attention from industry and academia
because of its wide application prospects. Researchers usually divide the DS
according to the function. However, many conversations require the DS to switch
between different functions. For example, movie discussion can change from
chit-chat to QA, the conversational recommendation can transform from chit-chat
to recommendation, etc. Therefore, classification according to functions may
not be enough to help us appreciate the current development trend. We classify
the DS based on background knowledge. Specifically, study the latest DS based
on the unstructured document(s). We define Document Grounded Dialogue System
(DGDS) as the DS that the dialogues are centering on the given document(s). The
DGDS can be used in scenarios such as talking over merchandise against product
Manual, commenting on news reports, etc. We believe that extracting
unstructured document(s) information is the future trend of the DS because a
great amount of human knowledge lies in these document(s). The research of the
DGDS not only possesses a broad application prospect but also facilitates AI to
better understand human knowledge and natural language. We analyze the
classification, architecture, datasets, models, and future development trends
of the DGDS, hoping to help researchers in this field.Comment: 30 pages, 4 figures, 13 table
"Wait, I'm Still Talking!" Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model
Producing natural and accurate responses like human beings is the ultimate
goal of intelligent dialogue agents. So far, most of the past works concentrate
on selecting or generating one pertinent and fluent response according to
current query and its context. These models work on a one-to-one environment,
making one response to one utterance each round. However, in real human-human
conversations, human often sequentially sends several short messages for
readability instead of a long message in one turn. Thus messages will not end
with an explicit ending signal, which is crucial for agents to decide when to
reply. So the first step for an intelligent dialogue agent is not replying but
deciding if it should reply at the moment. To address this issue, in this
paper, we propose a novel Imagine-then-Arbitrate (ITA) neural dialogue model to
help the agent decide whether to wait or to make a response directly. Our
method has two imaginator modules and an arbitrator module. The two imaginators
will learn the agent's and user's speaking style respectively, generate
possible utterances as the input of the arbitrator, combining with dialogue
history. And the arbitrator decides whether to wait or to make a response to
the user directly. To verify the performance and effectiveness of our method,
we prepared two dialogue datasets and compared our approach with several
popular models. Experimental results show that our model performs well on
addressing ending prediction issue and outperforms baseline models
Few-Shot Generalization Across Dialogue Tasks
Machine-learning based dialogue managers are able to learn complex behaviors
in order to complete a task, but it is not straightforward to extend their
capabilities to new domains. We investigate different policies' ability to
handle uncooperative user behavior, and how well expertise in completing one
task (such as restaurant reservations) can be reapplied when learning a new one
(e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy
(REDP), which embeds system actions and dialogue states in the same vector
space. REDP contains a memory component and attention mechanism based on a
modified Neural Turing Machine, and significantly outperforms a baseline LSTM
classifier on this task. We also show that both our architecture and baseline
solve the bAbI dialogue task, achieving 100% test accuracy
Augmenting End-to-End Dialog Systems with Commonsense Knowledge
Building dialog agents that can converse naturally with humans is a
challenging yet intriguing problem of artificial intelligence. In open-domain
human-computer conversation, where the conversational agent is expected to
respond to human responses in an interesting and engaging way, commonsense
knowledge has to be integrated into the model effectively. In this paper, we
investigate the impact of providing commonsense knowledge about the concepts
covered in the dialog. Our model represents the first attempt to integrating a
large commonsense knowledge base into end-to-end conversational models. In the
retrieval-based scenario, we propose the Tri-LSTM model to jointly take into
account message and commonsense for selecting an appropriate response. Our
experiments suggest that the knowledge-augmented models are superior to their
knowledge-free counterparts in automatic evaluation
SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching
We present a new method SOLOIST that uses transfer learning and machine
teaching to build task bots at scale. We parameterize classical modular
task-oriented dialog systems using a Transformer-based auto-regressive language
model, which subsumes different dialog modules into a single neural model. We
pre-train, on heterogeneous dialog corpora, a task-grounded response generation
model, which can generate dialog responses grounded in user goals and
real-world knowledge for task completion. The pre-trained model can be
efficiently adapted to accomplish new tasks with a handful of task-specific
dialogs via machine teaching, where training samples are generated by human
teachers interacting with the system. Experiments show that (i) SOLOIST creates
new state-of-the-art on well-studied task-oriented dialog benchmarks, including
CamRest676 and MultiWOZ; (ii) in the few-shot fine-tuning settings, SOLOIST
significantly outperforms existing methods, and (iii) the use of machine
teaching substantially reduces the labeling cost of fine-tuning. The
pre-trained models and codes are available at https://aka.ms/soloist.Comment: 18 pages; To appear at TACL; Project Website: https://aka.ms/solois
Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Generative encoder-decoder models offer great promise in developing
domain-general dialog systems. However, they have mainly been applied to
open-domain conversations. This paper presents a practical and novel framework
for building task-oriented dialog systems based on encoder-decoder models. This
framework enables encoder-decoder models to accomplish slot-value independent
decision-making and interact with external databases. Moreover, this paper
shows the flexibility of the proposed method by interleaving chatting
capability with a slot-filling system for better out-of-domain recovery. The
models were trained on both real-user data from a bus information system and
human-human chat data. Results show that the proposed framework achieves good
performance in both offline evaluation metrics and in task success rate with
human users.Comment: Accepted as a long paper in SIGIDIAL 201
DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
We develop a high-quality multi-turn dialog dataset, DailyDialog, which is
intriguing in several aspects. The language is human-written and less noisy.
The dialogues in the dataset reflect our daily communication way and cover
various topics about our daily life. We also manually label the developed
dataset with communication intention and emotion information. Then, we evaluate
existing approaches on DailyDialog dataset and hope it benefit the research
field of dialog systems.Comment: accepted by IJCNLP 201
A Repository of Conversational Datasets
Progress in Machine Learning is often driven by the availability of large
datasets, and consistent evaluation metrics for comparing modeling approaches.
To this end, we present a repository of conversational datasets consisting of
hundreds of millions of examples, and a standardised evaluation procedure for
conversational response selection models using '1-of-100 accuracy'. The
repository contains scripts that allow researchers to reproduce the standard
datasets, or to adapt the pre-processing and data filtering steps to their
needs. We introduce and evaluate several competitive baselines for
conversational response selection, whose implementations are shared in the
repository, as well as a neural encoder model that is trained on the entire
training set
Robust Conversational AI with Grounded Text Generation
This article presents a hybrid approach based on a Grounded Text Generation
(GTG) model to building robust task bots at scale. GTG is a hybrid model which
uses a large-scale Transformer neural network as its backbone, combined with
symbol-manipulation modules for knowledge base inference and prior knowledge
encoding, to generate responses grounded in dialog belief state and real-world
knowledge for task completion. GTG is pre-trained on large amounts of raw text
and human conversational data, and can be fine-tuned to complete a wide range
of tasks.
The hybrid approach and its variants are being developed simultaneously by
multiple research teams. The primary results reported on task-oriented dialog
benchmarks are very promising, demonstrating the big potential of this
approach. This article provides an overview of this progress and discusses
related methods and technologies that can be incorporated for building robust
conversational AI systems
深層学習に基づく感情会話分析に関する研究
Owning the capability to express specific emotions by a chatbot during a conversation is one of the key parts of artificial intelligence, which has an intuitive and quantifiable impact on the improvement of chatbot’s usability and user satisfaction. Enabling machines to emotion recognition in conversation is challenging, mainly because the information in human dialogue innately conveys emotions by long-term experience, abundant knowledge, context, and the intricate patterns between the affective states. Recently, many studies on neural emotional conversational models have been conducted. However, enabling the chatbot to control what kind of emotion to respond to upon its own characters in conversation is still underexplored. At this stage, people are no longer satisfied with using a dialogue system to solve specific tasks, and are more eager to achieve spiritual communication. In the chat process, if the robot can perceive the user's emotions and can accurately process them, it can greatly enrich the content of the dialogue and make the user empathize.
In the process of emotional dialogue, our ultimate goal is to make the machine understand human emotions and give matching responses. Based on these two points, this thesis explores and in-depth emotion recognition in conversation task and emotional dialogue generation task. In the past few years, although considerable progress has been made in emotional research in dialogue, there are still some difficulties and challenges due to the complex nature of human emotions. The key contributions in this thesis are summarized as below:
(1) Researchers have paid more attention to enhancing natural language models with knowledge graphs these days, since knowledge graph has gained a lot of systematic knowledge. A large number of studies had shown that the introduction of external commonsense knowledge is very helpful to improve the characteristic information. We address the task of emotion recognition in conversations using external knowledge to enhance semantics. In this work, we employ an external knowledge graph ATOMIC to extract the knowledge sources. We proposed KES model, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling, where build upon them to learn interactions between interlocutors participating in a conversation. The conversation is a sequence of coherent and orderly discourses. For neural networks, the capture of long-range context information is a weakness. We adopt Transformer a structure composed of self-attention and feed forward neural network, instead of the traditional RNN model, aiming at capturing remote context information. We design a self-attention layer specialized for enhanced semantic text features with external commonsense knowledge. Then, two different networks composed of LSTM are responsible for tracking individual internal state and context external state. In addition, the proposed model has experimented on three datasets in emotion detection in conversation. The experimental results show that our model outperforms the state-of-the-art approaches on most of the tested datasets.
(2) We proposed an emotional dialogue model based on Seq2Seq, which is improved from three aspects: model input, encoder structure, and decoder structure, so that the model can generate responses with rich emotions, diversity, and context. In terms of model input, emotional information and location information are added based on word vectors. In terms of the encoder, the proposed model first encodes the current input and sentence sentiment to generate a semantic vector, and additionally encodes the context and sentence sentiment to generate a context vector, adding contextual information while ensuring the independence of the current input. On the decoder side, attention is used to calculate the weights of the two semantic vectors separately and then decode, to fully integrate the local emotional semantic information and the global emotional semantic information. We used seven objective evaluation indicators to evaluate the model's generation results, context similarity, response diversity, and emotional response. Experimental results show that the model can generate diverse responses with rich sentiment, contextual associations
- …