37,475 research outputs found

    Observing, Coaching and Reflecting: A Multi-modal Natural Language-based Dialogue System in a Learning Context

    Get PDF
    The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive abilities that will advance our understanding of natural conversational human-machine interaction and dialogue interfaces. This paper introduces the vision for the system and discusses its application in the context of debate skills training where it has the potential to provide learners with a rich, immersive experience. In particular, it considers a potentially powerful learning analytics tool in the form of a performance reflection dashboard

    Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

    Full text link
    We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve good learning performance (accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpus -- a Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.Comment: 10 pages, RoboNLP Workshop from ACL Conferenc

    Visual Reasoning with Multi-hop Feature Modulation

    Get PDF
    Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation achieves state-of-the-art for the short input sequence task ReferIt --- on-par with single-hop FiLM generation --- while also significantly outperforming prior state-of-the-art and single-hop FiLM generation on the GuessWhat?! visual dialogue task.Comment: In Proc of ECCV 201

    ConfNet2Seq: Full Length Answer Generation from Spoken Questions

    Full text link
    Conversational and task-oriented dialogue systems aim to interact with the user using natural responses through multi-modal interfaces, such as text or speech. These desired responses are in the form of full-length natural answers generated over facts retrieved from a knowledge source. While the task of generating natural answers to questions from an answer span has been widely studied, there has been little research on natural sentence generation over spoken content. We propose a novel system to generate full length natural language answers from spoken questions and factoid answers. The spoken sequence is compactly represented as a confusion network extracted from a pre-trained Automatic Speech Recognizer. This is the first attempt towards generating full-length natural answers from a graph input(confusion network) to the best of our knowledge. We release a large-scale dataset of 259,788 samples of spoken questions, their factoid answers and corresponding full-length textual answers. Following our proposed approach, we achieve comparable performance with best ASR hypothesis.Comment: Accepted at Text, Speech and Dialogue, 202

    Specification Techniques for Multi-Modal Dialogues in the U-Wish Project

    Get PDF
    In this paper we describe the development of a specification\ud technique for specifying interactive web-based services. We\ud wanted to design a language that can be a means of\ud communication between designers and developers of interactive services, that makes it easier to develop web-based services fitted to the users and that shortens the pathway from design to implementation. The language, still under development, is based on process algebra and can be\ud connected to the results of task analysis. We have been\ud working on the automatic generation of executable prototypes\ud out of the specifications. In this way the specification\ud language can establish a connection between users, design\ud and implementation. A first version of this language is\ud available as well as prototype tools for executing the specifications. Ideas will be given as to how to make the connection between specifications and task analysis
    • …
    corecore