1,160 research outputs found
Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training
Dialogue Act (DA) classification is a challenging problem in dialogue
interpretation, which aims to attach semantic labels to utterances and
characterize the speaker's intention. Currently, many existing approaches
formulate the DA classification problem ranging from multi-classification to
structured prediction, which suffer from two limitations: a) these methods are
either handcrafted feature-based or have limited memories. b) adversarial
examples can't be correctly classified by traditional training methods. To
address these issues, in this paper we first cast the problem into a question
and answering problem and proposed an improved dynamic memory networks with
hierarchical pyramidal utterance encoder. Moreover, we apply adversarial
training to train our proposed model. We evaluate our model on two public
datasets, i.e., Switchboard dialogue act corpus and the MapTask corpus.
Extensive experiments show that our proposed model is not only robust, but also
achieves better performance when compared with some state-of-the-art baselines
Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts
Neural abstractive summarization has been increasingly studied, where the
prior work mainly focused on summarizing single-speaker documents (news,
scientific publications, etc). In dialogues, there are different interactions
between speakers, which are usually defined as dialogue acts. The interactive
signals may provide informative cues for better summarizing dialogues. This
paper proposes to explicitly leverage dialogue acts in a neural summarization
model, where a sentence-gated mechanism is designed for modeling the
relationship between dialogue acts and the summary. The experiments show that
our proposed model significantly improves the abstractive summarization
performance compared to the state-of-the-art baselines on AMI meeting corpus,
demonstrating the usefulness of the interactive signal provided by dialogue
acts.Comment: 8 pages, accepted by SLT 201
Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Generative encoder-decoder models offer great promise in developing
domain-general dialog systems. However, they have mainly been applied to
open-domain conversations. This paper presents a practical and novel framework
for building task-oriented dialog systems based on encoder-decoder models. This
framework enables encoder-decoder models to accomplish slot-value independent
decision-making and interact with external databases. Moreover, this paper
shows the flexibility of the proposed method by interleaving chatting
capability with a slot-filling system for better out-of-domain recovery. The
models were trained on both real-user data from a bus information system and
human-human chat data. Results show that the proposed framework achieves good
performance in both offline evaluation metrics and in task success rate with
human users.Comment: Accepted as a long paper in SIGIDIAL 201
Attention, please! A survey of Neural Attention Models in Deep Learning
In humans, Attention is a core property of all perceptual and cognitive
operations. Given our limited ability to process competing sources, attention
mechanisms select, modulate, and focus on the information most relevant to
behavior. For decades, concepts and functions of attention have been studied in
philosophy, psychology, neuroscience, and computing. For the last six years,
this property has been widely explored in deep neural networks. Currently, the
state-of-the-art in Deep Learning is represented by neural attention models in
several application domains. This survey provides a comprehensive overview and
analysis of developments in neural attention models. We systematically reviewed
hundreds of architectures in the area, identifying and discussing those in
which attention has shown a significant impact. We also developed and made
public an automated methodology to facilitate the development of reviews in the
area. By critically analyzing 650 works, we describe the primary uses of
attention in convolutional, recurrent networks and generative models,
identifying common subgroups of uses and applications. Furthermore, we describe
the impact of attention in different application domains and their impact on
neural networks' interpretability. Finally, we list possible trends and
opportunities for further research, hoping that this review will provide a
succinct overview of the main attentional models in the area and guide
researchers in developing future approaches that will drive further
improvements.Comment: 66 pages, 24 figure
Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems
Self-attentional models are a new paradigm for sequence modelling tasks which
differ from common sequence modelling methods, such as recurrence-based and
convolution-based sequence learning, in the way that their architecture is only
based on the attention mechanism. Self-attentional models have been used in the
creation of the state-of-the-art models in many NLP tasks such as neural
machine translation, but their usage has not been explored for the task of
training end-to-end task-oriented dialogue generation systems yet. In this
study, we apply these models on the three different datasets for training
task-oriented chatbots. Our finding shows that self-attentional models can be
exploited to create end-to-end task-oriented chatbots which not only achieve
higher evaluation scores compared to recurrence-based models, but also do so
more efficiently.Comment: Appeared in proceedings of Recent Advances in Natural Language
Processing (RANLP) Conference, 201
Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators
Encoder-decoder based neural architectures serve as the basis of
state-of-the-art approaches in end-to-end open domain dialog systems. Since
most of such systems are trained with a maximum likelihood~(MLE) objective they
suffer from issues such as lack of generalizability and the generic response
problem, i.e., a system response that can be an answer to a large number of
user utterances, e.g., "Maybe, I don't know." Having explicit feedback on the
relevance and interestingness of a system response at each turn can be a useful
signal for mitigating such issues and improving system quality by selecting
responses from different approaches. Towards this goal, we present a system
that evaluates chatbot responses at each dialog turn for coherence and
engagement. Our system provides explicit turn-level dialog quality feedback,
which we show to be highly correlated with human evaluation. To show that
incorporating this feedback in the neural response generation models improves
dialog quality, we present two different and complementary mechanisms to
incorporate explicit feedback into a neural response generation model:
reranking and direct modification of the loss function during training. Our
studies show that a response generation model that incorporates these combined
feedback mechanisms produce more engaging and coherent responses in an
open-domain spoken dialog setting, significantly improving the response quality
using both automatic and human evaluation
Self Paced Adversarial Training for Multimodal Few-shot Learning
State-of-the-art deep learning algorithms yield remarkable results in many
visual recognition tasks. However, they still fail to provide satisfactory
results in scarce data regimes. To a certain extent this lack of data can be
compensated by multimodal information. Missing information in one modality of a
single data point (e.g. an image) can be made up for in another modality (e.g.
a textual description). Therefore, we design a few-shot learning task that is
multimodal during training (i.e. image and text) and single-modal during test
time (i.e. image). In this regard, we propose a self-paced class-discriminative
generative adversarial network incorporating multimodality in the context of
few-shot learning. The proposed approach builds upon the idea of cross-modal
data generation in order to alleviate the data sparsity problem. We improve
few-shot learning accuracies on the finegrained CUB and Oxford-102 datasets.Comment: To appear at WACV 201
A Survey of Document Grounded Dialogue Systems (DGDS)
Dialogue system (DS) attracts great attention from industry and academia
because of its wide application prospects. Researchers usually divide the DS
according to the function. However, many conversations require the DS to switch
between different functions. For example, movie discussion can change from
chit-chat to QA, the conversational recommendation can transform from chit-chat
to recommendation, etc. Therefore, classification according to functions may
not be enough to help us appreciate the current development trend. We classify
the DS based on background knowledge. Specifically, study the latest DS based
on the unstructured document(s). We define Document Grounded Dialogue System
(DGDS) as the DS that the dialogues are centering on the given document(s). The
DGDS can be used in scenarios such as talking over merchandise against product
Manual, commenting on news reports, etc. We believe that extracting
unstructured document(s) information is the future trend of the DS because a
great amount of human knowledge lies in these document(s). The research of the
DGDS not only possesses a broad application prospect but also facilitates AI to
better understand human knowledge and natural language. We analyze the
classification, architecture, datasets, models, and future development trends
of the DGDS, hoping to help researchers in this field.Comment: 30 pages, 4 figures, 13 table
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
Conversational interfaces for the detail-oriented retail fashion domain are
more natural, expressive, and user friendly than classical keyword-based search
interfaces. In this paper, we introduce the Fashion IQ dataset to support and
advance research on interactive fashion image retrieval. Fashion IQ is the
first fashion dataset to provide human-generated captions that distinguish
similar pairs of garment images together with side-information consisting of
real-world product descriptions and derived visual attribute labels for these
images. We provide a detailed analysis of the characteristics of the Fashion IQ
data, and present a transformer-based user simulator and interactive image
retriever that can seamlessly integrate visual attributes with image features,
user feedback, and dialog history, leading to improved performance over the
state of the art in dialog-based image retrieval. We believe that our dataset
will encourage further work on developing more natural and real-world
applicable conversational shopping assistants
Content Word-based Sentence Decoding and Evaluating for Open-domain Neural Response Generation
Various encoder-decoder models have been applied to response generation in
open-domain dialogs, but a majority of conventional models directly learn a
mapping from lexical input to lexical output without explicitly modeling
intermediate representations. Utilizing language hierarchy and modeling
intermediate information have been shown to benefit many language understanding
and generation tasks. Motivated by Broca's aphasia, we propose to use a content
word sequence as an intermediate representation for open-domain response
generation. Experimental results show that the proposed method improves content
relatedness of produced responses, and our models can often choose correct
grammar for generated content words. Meanwhile, instead of evaluating complete
sentences, we propose to compute conventional metrics on content word
sequences, which is a better indicator of content relevance.Comment: 13 pages, 2 figures, 8 tables (rejected by ACL 2019
- …