37,475 research outputs found
Observing, Coaching and Reflecting: A Multi-modal Natural Language-based Dialogue System in a Learning Context
The Metalogue project aims to develop a multi-modal, multi-party dialogue system with metacognitive abilities that will advance our understanding of natural conversational human-machine interaction and dialogue interfaces. This paper introduces the vision for the system and discusses its application in the context of debate skills training where it has the potential to provide learners with a rich, immersive experience. In particular, it considers a potentially powerful learning analytics tool in the form of a performance reflection dashboard
Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings
We present an optimised multi-modal dialogue agent for interactive learning
of visually grounded word meanings from a human tutor, trained on real
human-human tutoring data. Within a life-long interactive learning period, the
agent, trained using Reinforcement Learning (RL), must be able to handle
natural conversations with human users and achieve good learning performance
(accuracy) while minimising human effort in the learning process. We train and
evaluate this system in interaction with a simulated human tutor, which is
built on the BURCHAK corpus -- a Human-Human Dialogue dataset for the visual
learning task. The results show that: 1) The learned policy can coherently
interact with the simulated user to achieve the goal of the task (i.e. learning
visual attributes of objects, e.g. colour and shape); and 2) it finds a better
trade-off between classifier accuracy and tutoring costs than hand-crafted
rule-based policies, including ones with dynamic policies.Comment: 10 pages, RoboNLP Workshop from ACL Conferenc
Visual Reasoning with Multi-hop Feature Modulation
Recent breakthroughs in computer vision and natural language processing have
spurred interest in challenging multi-modal tasks such as visual
question-answering and visual dialogue. For such tasks, one successful approach
is to condition image-based convolutional network computation on language via
Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and
shifting. We propose to generate the parameters of FiLM layers going up the
hierarchy of a convolutional network in a multi-hop fashion rather than all at
once, as in prior work. By alternating between attending to the language input
and generating FiLM layer parameters, this approach is better able to scale to
settings with longer input sequences such as dialogue. We demonstrate that
multi-hop FiLM generation achieves state-of-the-art for the short input
sequence task ReferIt --- on-par with single-hop FiLM generation --- while also
significantly outperforming prior state-of-the-art and single-hop FiLM
generation on the GuessWhat?! visual dialogue task.Comment: In Proc of ECCV 201
ConfNet2Seq: Full Length Answer Generation from Spoken Questions
Conversational and task-oriented dialogue systems aim to interact with the
user using natural responses through multi-modal interfaces, such as text or
speech. These desired responses are in the form of full-length natural answers
generated over facts retrieved from a knowledge source. While the task of
generating natural answers to questions from an answer span has been widely
studied, there has been little research on natural sentence generation over
spoken content. We propose a novel system to generate full length natural
language answers from spoken questions and factoid answers. The spoken sequence
is compactly represented as a confusion network extracted from a pre-trained
Automatic Speech Recognizer. This is the first attempt towards generating
full-length natural answers from a graph input(confusion network) to the best
of our knowledge. We release a large-scale dataset of 259,788 samples of spoken
questions, their factoid answers and corresponding full-length textual answers.
Following our proposed approach, we achieve comparable performance with best
ASR hypothesis.Comment: Accepted at Text, Speech and Dialogue, 202
Specification Techniques for Multi-Modal Dialogues in the U-Wish Project
In this paper we describe the development of a specification\ud
technique for specifying interactive web-based services. We\ud
wanted to design a language that can be a means of\ud
communication between designers and developers of interactive services, that makes it easier to develop web-based services fitted to the users and that shortens the pathway from design to implementation. The language, still under development, is based on process algebra and can be\ud
connected to the results of task analysis. We have been\ud
working on the automatic generation of executable prototypes\ud
out of the specifications. In this way the specification\ud
language can establish a connection between users, design\ud
and implementation. A first version of this language is\ud
available as well as prototype tools for executing the specifications. Ideas will be given as to how to make the connection between specifications and task analysis
- …