1,160 research outputs found

    Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training

    Full text link
    Dialogue Act (DA) classification is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DA classification problem ranging from multi-classification to structured prediction, which suffer from two limitations: a) these methods are either handcrafted feature-based or have limited memories. b) adversarial examples can't be correctly classified by traditional training methods. To address these issues, in this paper we first cast the problem into a question and answering problem and proposed an improved dynamic memory networks with hierarchical pyramidal utterance encoder. Moreover, we apply adversarial training to train our proposed model. We evaluate our model on two public datasets, i.e., Switchboard dialogue act corpus and the MapTask corpus. Extensive experiments show that our proposed model is not only robust, but also achieves better performance when compared with some state-of-the-art baselines

    Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts

    Full text link
    Neural abstractive summarization has been increasingly studied, where the prior work mainly focused on summarizing single-speaker documents (news, scientific publications, etc). In dialogues, there are different interactions between speakers, which are usually defined as dialogue acts. The interactive signals may provide informative cues for better summarizing dialogues. This paper proposes to explicitly leverage dialogue acts in a neural summarization model, where a sentence-gated mechanism is designed for modeling the relationship between dialogue acts and the summary. The experiments show that our proposed model significantly improves the abstractive summarization performance compared to the state-of-the-art baselines on AMI meeting corpus, demonstrating the usefulness of the interactive signal provided by dialogue acts.Comment: 8 pages, accepted by SLT 201

    Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability

    Full text link
    Generative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by interleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users.Comment: Accepted as a long paper in SIGIDIAL 201

    Attention, please! A survey of Neural Attention Models in Deep Learning

    Full text link
    In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and focus on the information most relevant to behavior. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. For the last six years, this property has been widely explored in deep neural networks. Currently, the state-of-the-art in Deep Learning is represented by neural attention models in several application domains. This survey provides a comprehensive overview and analysis of developments in neural attention models. We systematically reviewed hundreds of architectures in the area, identifying and discussing those in which attention has shown a significant impact. We also developed and made public an automated methodology to facilitate the development of reviews in the area. By critically analyzing 650 works, we describe the primary uses of attention in convolutional, recurrent networks and generative models, identifying common subgroups of uses and applications. Furthermore, we describe the impact of attention in different application domains and their impact on neural networks' interpretability. Finally, we list possible trends and opportunities for further research, hoping that this review will provide a succinct overview of the main attentional models in the area and guide researchers in developing future approaches that will drive further improvements.Comment: 66 pages, 24 figure

    Self-Attentional Models Application in Task-Oriented Dialogue Generation Systems

    Full text link
    Self-attentional models are a new paradigm for sequence modelling tasks which differ from common sequence modelling methods, such as recurrence-based and convolution-based sequence learning, in the way that their architecture is only based on the attention mechanism. Self-attentional models have been used in the creation of the state-of-the-art models in many NLP tasks such as neural machine translation, but their usage has not been explored for the task of training end-to-end task-oriented dialogue generation systems yet. In this study, we apply these models on the three different datasets for training task-oriented chatbots. Our finding shows that self-attentional models can be exploited to create end-to-end task-oriented chatbots which not only achieve higher evaluation scores compared to recurrence-based models, but also do so more efficiently.Comment: Appeared in proceedings of Recent Advances in Natural Language Processing (RANLP) Conference, 201

    Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators

    Full text link
    Encoder-decoder based neural architectures serve as the basis of state-of-the-art approaches in end-to-end open domain dialog systems. Since most of such systems are trained with a maximum likelihood~(MLE) objective they suffer from issues such as lack of generalizability and the generic response problem, i.e., a system response that can be an answer to a large number of user utterances, e.g., "Maybe, I don't know." Having explicit feedback on the relevance and interestingness of a system response at each turn can be a useful signal for mitigating such issues and improving system quality by selecting responses from different approaches. Towards this goal, we present a system that evaluates chatbot responses at each dialog turn for coherence and engagement. Our system provides explicit turn-level dialog quality feedback, which we show to be highly correlated with human evaluation. To show that incorporating this feedback in the neural response generation models improves dialog quality, we present two different and complementary mechanisms to incorporate explicit feedback into a neural response generation model: reranking and direct modification of the loss function during training. Our studies show that a response generation model that incorporates these combined feedback mechanisms produce more engaging and coherent responses in an open-domain spoken dialog setting, significantly improving the response quality using both automatic and human evaluation

    Self Paced Adversarial Training for Multimodal Few-shot Learning

    Full text link
    State-of-the-art deep learning algorithms yield remarkable results in many visual recognition tasks. However, they still fail to provide satisfactory results in scarce data regimes. To a certain extent this lack of data can be compensated by multimodal information. Missing information in one modality of a single data point (e.g. an image) can be made up for in another modality (e.g. a textual description). Therefore, we design a few-shot learning task that is multimodal during training (i.e. image and text) and single-modal during test time (i.e. image). In this regard, we propose a self-paced class-discriminative generative adversarial network incorporating multimodality in the context of few-shot learning. The proposed approach builds upon the idea of cross-modal data generation in order to alleviate the data sparsity problem. We improve few-shot learning accuracies on the finegrained CUB and Oxford-102 datasets.Comment: To appear at WACV 201

    A Survey of Document Grounded Dialogue Systems (DGDS)

    Full text link
    Dialogue system (DS) attracts great attention from industry and academia because of its wide application prospects. Researchers usually divide the DS according to the function. However, many conversations require the DS to switch between different functions. For example, movie discussion can change from chit-chat to QA, the conversational recommendation can transform from chit-chat to recommendation, etc. Therefore, classification according to functions may not be enough to help us appreciate the current development trend. We classify the DS based on background knowledge. Specifically, study the latest DS based on the unstructured document(s). We define Document Grounded Dialogue System (DGDS) as the DS that the dialogues are centering on the given document(s). The DGDS can be used in scenarios such as talking over merchandise against product Manual, commenting on news reports, etc. We believe that extracting unstructured document(s) information is the future trend of the DS because a great amount of human knowledge lies in these document(s). The research of the DGDS not only possesses a broad application prospect but also facilitates AI to better understand human knowledge and natural language. We analyze the classification, architecture, datasets, models, and future development trends of the DGDS, hoping to help researchers in this field.Comment: 30 pages, 4 figures, 13 table

    Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

    Full text link
    Conversational interfaces for the detail-oriented retail fashion domain are more natural, expressive, and user friendly than classical keyword-based search interfaces. In this paper, we introduce the Fashion IQ dataset to support and advance research on interactive fashion image retrieval. Fashion IQ is the first fashion dataset to provide human-generated captions that distinguish similar pairs of garment images together with side-information consisting of real-world product descriptions and derived visual attribute labels for these images. We provide a detailed analysis of the characteristics of the Fashion IQ data, and present a transformer-based user simulator and interactive image retriever that can seamlessly integrate visual attributes with image features, user feedback, and dialog history, leading to improved performance over the state of the art in dialog-based image retrieval. We believe that our dataset will encourage further work on developing more natural and real-world applicable conversational shopping assistants

    Content Word-based Sentence Decoding and Evaluating for Open-domain Neural Response Generation

    Full text link
    Various encoder-decoder models have been applied to response generation in open-domain dialogs, but a majority of conventional models directly learn a mapping from lexical input to lexical output without explicitly modeling intermediate representations. Utilizing language hierarchy and modeling intermediate information have been shown to benefit many language understanding and generation tasks. Motivated by Broca's aphasia, we propose to use a content word sequence as an intermediate representation for open-domain response generation. Experimental results show that the proposed method improves content relatedness of produced responses, and our models can often choose correct grammar for generated content words. Meanwhile, instead of evaluating complete sentences, we propose to compute conventional metrics on content word sequences, which is a better indicator of content relevance.Comment: 13 pages, 2 figures, 8 tables (rejected by ACL 2019
    corecore