30,564 research outputs found
Problem spotting in human-machine interaction
In human-human communication, dialogue participants are con-tinuously sending and receiving signals on the status of the inform-ation being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and systemwould improve. Therefore, the goals of the present study are as follows: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and (ii) to see which (combinations of) cues have the best predictive potential for spotting the presence or absence of problems. It was found that subjects systematically use negative/marked cues (more words, marked word order, more repetitions and corrections, less new information etc.) when there are communication problems. Using precision and recall matrices it was found that various combinations of cues are accurate problem spotters. This kind of information may turn out to be highly relevant for spoken dia-logue systems, e.g., by providing quantitative criteria for changing the dialogue strategy or speech recognition engine
Predicting Causes of Reformulation in Intelligent Assistants
Intelligent assistants (IAs) such as Siri and Cortana conversationally
interact with users and execute a wide range of actions (e.g., searching the
Web, setting alarms, and chatting). IAs can support these actions through the
combination of various components such as automatic speech recognition, natural
language understanding, and language generation. However, the complexity of
these components hinders developers from determining which component causes an
error. To remove this hindrance, we focus on reformulation, which is a useful
signal of user dissatisfaction, and propose a method to predict the
reformulation causes. We evaluate the method using the user logs of a
commercial IA. The experimental results have demonstrated that features
designed to detect the error of a specific component improve the performance of
reformulation cause detection.Comment: 11 pages, 2 figures, accepted as a long paper for SIGDIAL 201
The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings
We motivate and describe a new freely available human-human dialogue dataset
for interactive learning of visually grounded word meanings through ostensive
definition by a tutor to a learner. The data has been collected using a novel,
character-by-character variant of the DiET chat tool (Healey et al., 2003;
Mills and Healey, submitted) with a novel task, where a Learner needs to learn
invented visual attribute words (such as " burchak " for square) from a tutor.
As such, the text-based interactions closely resemble face-to-face conversation
and thus contain many of the linguistic phenomena encountered in natural,
spontaneous dialogue. These include self-and other-correction, mid-sentence
continuations, interruptions, overlaps, fillers, and hedges. We also present a
generic n-gram framework for building user (i.e. tutor) simulations from this
type of incremental data, which is freely available to researchers. We show
that the simulations produce outputs that are similar to the original data
(e.g. 78% turn match similarity). Finally, we train and evaluate a
Reinforcement Learning dialogue control agent for learning visually grounded
word meanings, trained from the BURCHAK corpus. The learned policy shows
comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17
Teaching robots parametrized executable plans through spoken interaction
While operating in domestic environments, robots will necessarily
face difficulties not envisioned by their developers at programming
time. Moreover, the tasks to be performed by a robot will often
have to be specialized and/or adapted to the needs of specific users
and specific environments. Hence, learning how to operate by interacting
with the user seems a key enabling feature to support the
introduction of robots in everyday environments.
In this paper we contribute a novel approach for learning, through
the interaction with the user, task descriptions that are defined as a
combination of primitive actions. The proposed approach makes
a significant step forward by making task descriptions parametric
with respect to domain specific semantic categories. Moreover, by
mapping the task representation into a task representation language,
we are able to express complex execution paradigms and to revise
the learned tasks in a high-level fashion. The approach is evaluated
in multiple practical applications with a service robot
Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena
Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis;
and it contains many sorts of disfluency such as mid-utterance/sentence
hesitations, interruptions, and self-corrections. But training data for machine
learning approaches to dialogue processing is often either cleaned-up or wholly
synthetic in order to avoid such phenomena. The question then arises of how
well systems trained on such clean data generalise to real spontaneous
dialogue, or indeed whether they are trainable at all on naturally occurring
dialogue data. To answer this question, we created a new corpus called bAbI+ by
systematically adding natural spontaneous incremental dialogue phenomena such
as restarts and self-corrections to the Facebook AI Research's bAbI dialogues
dataset. We then explore the performance of a state-of-the-art retrieval model,
MemN2N, on this more natural dataset. Results show that the semantic accuracy
of the MemN2N model drops drastically; and that although it is in principle
able to learn to process the constructions in bAbI+, it needs an impractical
amount of training data to do so. Finally, we go on to show that an
incremental, semantic parser -- DyLan -- shows 100% semantic accuracy on both
bAbI and bAbI+, highlighting the generalisation properties of linguistically
informed dialogue models.Comment: 9 pages, 3 figures, 2 tables. Accepted as a full paper for SemDial
201
- …