44,378 research outputs found

    The Rapidly Changing Landscape of Conversational Agents

    Full text link
    Conversational agents have become ubiquitous, ranging from goal-oriented systems for helping with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial network based and reinforcement learning based approaches and how they evolved. Along the way we discuss various challenges that the field faces, lack of context in utterances, not having a good quantitative metric to compare models, lack of trust in agents because they do not have a consistent persona etc. We structure this paper in a way that answers these pertinent questions and discusses competing approaches to solve them.Comment: 14 pages, 7 figures. arXiv admin note: text overlap with arXiv:1704.07130, arXiv:1507.04808, arXiv:1603.06155, arXiv:1611.06997, arXiv:1704.08966 by other author

    Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

    Full text link
    Language understanding is a key component in a spoken dialogue system. In this paper, we investigate how the language understanding module influences the dialogue system performance by conducting a series of systematic experiments on a task-oriented neural dialogue system in a reinforcement learning based setting. The empirical study shows that among different types of language understanding errors, slot-level errors can have more impact on the overall performance of a dialogue system compared to intent-level errors. In addition, our experiments demonstrate that the reinforcement learning based dialogue system is able to learn when and what to confirm in order to achieve better performance and greater robustness.Comment: 5 pages, 5 figure

    Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

    Full text link
    Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the GuessWhat?! game, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 7%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances.Comment: Published in IEEE Spoken Language Technology (SLT 2018), Athens, Greec

    Neural Approaches to Conversational AI

    Full text link
    The present paper surveys neural approaches to conversational AI that have been developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) chatbots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.Comment: Foundations and Trends in Information Retrieval (95 pages

    Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

    Full text link
    This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.Comment: 5 pages, 3 figures, ICASSP 201

    Decoupling Strategy and Generation in Negotiation Dialogues

    Full text link
    We consider negotiation settings in which two agents use natural language to bargain on goods. Agents need to decide on both high-level strategy (e.g., proposing \$50) and the execution of that strategy (e.g., generating "The bike is brand new. Selling for just \$50."). Recent work on negotiation trains neural models, but their end-to-end nature makes it hard to control their strategy, and reinforcement learning tends to lead to degenerate solutions. In this paper, we propose a modular approach based on coarse di- alogue acts (e.g., propose(price=50)) that decouples strategy and generation. We show that we can flexibly set the strategy using supervised learning, reinforcement learning, or domain-specific knowledge without degeneracy, while our retrieval-based generation can maintain context-awareness and produce diverse utterances. We test our approach on the recently proposed DEALORNODEAL game, and we also collect a richer dataset based on real items on Craigslist. Human evaluation shows that our systems achieve higher task success rate and more human-like negotiation behavior than previous approaches.Comment: EMNLP 201

    Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning

    Full text link
    This paper presents a Discriminative Deep Dyna-Q (D3Q) approach to improving the effectiveness and robustness of Deep Dyna-Q (DDQ), a recently proposed framework that extends the Dyna-Q algorithm to integrate planning for task-completion dialogue policy learning. To obviate DDQ's high dependency on the quality of simulated experiences, we incorporate an RNN-based discriminator in D3Q to differentiate simulated experience from real user experience in order to control the quality of training data. Experiments show that D3Q significantly outperforms DDQ by controlling the quality of simulated experience used for planning. The effectiveness and robustness of D3Q is further demonstrated in a domain extension setting, where the agent's capability of adapting to a changing environment is tested.Comment: 11 pages, 10 figures, EMNLP 2018 long pape

    Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

    Full text link
    Building a dialogue agent to fulfill complex tasks, such as travel planning, is challenging because the agent has to learn to collectively complete multiple subtasks. For example, the agent needs to reserve a hotel and book a flight so that there leaves enough time for commute between arrival and hotel check-in. This paper addresses this challenge by formulating the task in the mathematical framework of options over Markov Decision Processes (MDPs), and proposing a hierarchical deep reinforcement learning approach to learning a dialogue manager that operates at different temporal scales. The dialogue manager consists of: (1) a top-level dialogue policy that selects among subtasks or options, (2) a low-level dialogue policy that selects primitive actions to complete the subtask given by the top-level policy, and (3) a global state tracker that helps ensure all cross-subtask constraints be satisfied. Experiments on a travel planning task with simulated and real users show that our approach leads to significant improvements over three baselines, two based on handcrafted rules and the other based on flat deep reinforcement learning.Comment: 12 pages, 8 figure

    End-to-End Task-Completion Neural Dialogue Systems

    Full text link
    One of the major drawbacks of modularized task-completion dialogue systems is that each module is trained individually, which presents several challenges. For example, downstream modules are affected by earlier modules, and the performance of the entire system is not robust to the accumulated errors. This paper presents a novel end-to-end learning framework for task-completion dialogue systems to tackle such issues. Our neural dialogue system can directly interact with a structured database to assist users in accessing information and accomplishing certain tasks. The reinforcement learning based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system. Our experiments in a movie-ticket booking domain show that our end-to-end system not only outperforms modularized dialogue system baselines for both objective and subjective evaluation, but also is robust to noises as demonstrated by several systematic experiments with different error granularity and rates specific to the language understanding module.Comment: 11 pages, IJCNLP 2017, arXiv admin note: substantial text overlap with arXiv:1703.0705

    Chat More If You Like: Dynamic Cue Words Planning to Flow Longer Conversations

    Full text link
    To build an open-domain multi-turn conversation system is one of the most interesting and challenging tasks in Artificial Intelligence. Many research efforts have been dedicated to building such dialogue systems, yet few shed light on modeling the conversation flow in an ongoing dialogue. Besides, it is common for people to talk about highly relevant aspects during a conversation. And the topics are coherent and drift naturally, which demonstrates the necessity of dialogue flow modeling. To this end, we present the multi-turn cue-words driven conversation system with reinforcement learning method (RLCw), which strives to select an adaptive cue word with the greatest future credit, and therefore improve the quality of generated responses. We introduce a new reward to measure the quality of cue words in terms of effectiveness and relevance. To further optimize the model for long-term conversations, a reinforcement approach is adopted in this paper. Experiments on real-life dataset demonstrate that our model consistently outperforms a set of competitive baselines in terms of simulated turns, diversity and human evaluation
    corecore