9 research outputs found

    A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System

    Full text link
    Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two critical components of every conversational system that handles the task of understanding the user by capturing the necessary information in the form of slots and generating an appropriate response in accordance with the extracted information. Recently, dialogue systems integrated with complementary information such as images, audio, or video have gained immense popularity. In this work, we propose an end-to-end framework with the capability to extract necessary slot values from the utterance and generate a coherent response, thereby assisting the user to achieve their desired goals in a multimodal dialogue system having both textual and visual information. The task of extracting the necessary information is dependent not only on the text but also on the visual cues present in the dialogue. Similarly, for the generation, the previous dialog context comprising multimodal information is significant for providing coherent and informative responses. We employ a multimodal hierarchical encoder using pre-trained DialoGPT and also exploit the knowledge base (Kb) to provide a stronger context for both the tasks. Finally, we design a slot attention mechanism to focus on the necessary information in a given utterance. Lastly, a decoder generates the corresponding response for the given dialogue context and the extracted slot values. Experimental results on the Multimodal Dialogue Dataset (MMD) show that the proposed framework outperforms the baselines approaches in both the tasks. The code is available at https://github.com/avinashsai/slot-gpt.Comment: Published in the journal Multimedia Tools and Application

    A Unified Framework for Emotion Identification and Generation in Dialogues

    Full text link
    Social chatbots have gained immense popularity, and their appeal lies not just in their capacity to respond to the diverse requests from users, but also in the ability to develop an emotional connection with users. To further develop and promote social chatbots, we need to concentrate on increasing user interaction and take into account both the intellectual and emotional quotient in the conversational agents. In this paper, we propose a multi-task framework that jointly identifies the emotion of a given dialogue and generates response in accordance to the identified emotion. We employ a BERT based network for creating an empathetic system and use a mixed objective function that trains the end-to-end network with both the classification and generation loss. Experimental results show that our proposed framework outperforms current state-of-the-art model

    Multitask learning for multilingual intent detection and slot filling in dialogue systems

    No full text
    Dialogue systems are becoming an ubiquitous presence in our everyday lives having a huge impact on business and society. Spoken language understanding (SLU) is the critical component of every goal-oriented dialogue system or any conversational system. The understanding of the user utterance is crucial for assisting the user in achieving their desired objectives. Future-generation systems need to be able to handle the multilinguality issue. Hence, the development of conversational agents becomes challenging as it needs to understand the different languages along with the semantic meaning of the given utterance. In this work, we propose a multilingual multitask approach to fuse the two primary SLU tasks, namely, intent detection and slot filling for three different languages. While intent detection deals with identifying user's goal or purpose, slot filling captures the appropriate user utterance information in the form of slots. As both of these tasks are highly correlated, we propose a multitask strategy to tackle these two tasks concurrently. We employ a transformer as a shared sentence encoder for the three languages, i.e., English, Hindi, and Bengali. Experimental results show that the proposed model achieves an improvement for all the languages for both the tasks of SLU. The multi-lingual multi-task (MLMT) framework shows an improvement of more than 2% in case of intent accuracy and 3% for slot F1 score in comparison to the single task models. Also, there is an increase of more than 1 point intent accuracy and 2 points slot F1 score in the MLMT model as opposed to the language specific frameworks.Agency for Science, Technology and Research (A*STAR)This research is supported by the Imprint 2C sponsored project titled ‘‘Sevak-An Intelligent Indian Language Chatbot’’. This research is also supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project #A18A2b0046)

    GenPADS: Reinforcing politeness in an end-to-end dialogue system.

    No full text
    In a task-oriented dialogue setting, user's mood and demands can change in an ongoing dialogue, which may lead to a non-informative conversation or may result in conversation drop-off. To rectify such scenarios, a conversational agent should be able to learn the user's behaviour online, and form informative, empathetic and interactive responses. To incorporate these three aspects, we propose a novel end-to-end dialogue system GenPADS. First, we build and train two models, viz. a politeness classifier to extract polite information present in user's and agent's utterances and a generation model (G) to generate varying but semantically correct responses. We then incorporate both of these models in a reinforcement learning (RL) setting using two different politeness oriented reward algorithms to adapt and generate polite responses. To train our politeness classifier, we annotate recently released Taskmaster dataset into four fine-grained classes depicting politeness and impoliteness. Further, to train our generator model, we prepare a GenDD dataset using the same Taskmaster dataset. Lastly, we train GenPADS and perform automatic and human evaluation by building seven different user simulators. Detailed analysis reveals that GenPADS performs better than the two considered baselines,viz. a transformer based seq2seq generator model for user's and agent's utterance and a retrieval based politeness adaptive dialogue system (PADS)

    More to diverse: Generating diversified responses in a task oriented multimodal dialog system.

    No full text
    Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems having both textual and visual modalities. Response generation is an essential aspect of every dialogue system, and making the responses diverse is an important problem. For any goal-oriented conversational agent, the system's responses must be informative, diverse and polite, that may lead to better user experiences. In this paper, we propose an end-to-end neural framework for generating varied responses in a multimodal dialogue setup capturing information from both the text and image. Multimodal encoder with co-attention between the text and image is used for focusing on the different modalities to obtain better contextual information. For effective information sharing across the modalities, we combine the information of text and images using the BLOCK fusion technique that helps in learning an improved multimodal representation. We employ stochastic beam search with Gumble Top K-tricks to achieve diversified responses while preserving the content and politeness in the responses. Experimental results show that our proposed approach performs significantly better compared to the existing and baseline methods in terms of distinct metrics, and thereby generates more diverse responses that are informative, interesting and polite without any loss of information. Empirical evaluation also reveals that images, while used along with the text, improve the efficiency of the model in generating diversified responses

    More the Merrier: Towards Multi-Emotion and Intensity Controllable Response Generation

    No full text
    The focus on conversational systems has recently shifted towards creating engaging agents by inculcating emotions into them. Human emotions are highly complex as humans can express multiple emotions with varying intensity in a single utterance, whereas the conversational agents convey only one emotion in their responses. To infuse human-like behaviour in the agents, we introduce the task of multi-emotion controllable response generation with the ability to express different emotions with varying levels of intensity in an open-domain dialogue system. We introduce a Multiple Emotion Intensity aware Multi-party Dialogue (MEIMD) dataset having 34k conversations taken from 8 different TV Series. We finally propose a Multiple Emotion with Intensity-based Dialogue Generation (MEI-DG) framework. The system employs two novel mechanisms: viz. (i) determining the trade-off between the emotion and generic words, while focusing on the intensity of the desired emotions; and (ii) computing the amount of emotion left to be expressed, thereby regulating the generation accordingly. The detailed evaluation shows that our proposed approach attains superior performance compared to the baseline models

    Statistics of user simulator for each of the seven domains.

    No full text
    Statistics of user simulator for each of the seven domains.</p

    GenPADS polite classifier (PC), generation module (G) evaluation results.

    No full text
    GenPADS polite classifier (PC), generation module (G) evaluation results.</p

    EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues

    Full text link
    The long-standing goal of Artificial Intelligence (AI) has been to create human-like conversational systems. Such systems should have the ability to develop an emotional connection with the users, hence emotion recognition in dialogues is an important task. Emotion detection in dialogues is a challenging task because humans usually convey multiple emotions with varying degrees of intensities in a single utterance. Moreover, emotion in an utterance of a dialogue may be dependent on previous utterances making the task more complex. Emotion recognition has always been in great demand. However, most of the existing datasets for multi-label emotion and intensity detection in conversations are in English. To this end, we create a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations containing 1,814 dialogues with a total of 44,247 utterances. We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims. Each utterance of the dialogue is annotated with one or more emotion categories from the 16 emotion classes including neutral, and their corresponding intensity values. We further propose strong contextual baselines that can detect emotion(s) and the corresponding intensity of an utterance given the conversational context.Comment: This paper is accepted at LREC 202
    corecore