Search CORE

27,566 research outputs found

A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction

Author: Gustafson Joakim
Karaoguz Hakan
Kontogiorgos Dimosthenis
Kragic Danica
Leite Iolanda
Nykvist Olov
Sibirtseva Elena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/01/2018
Field of study

Picking up objects requested by a human user is a common task in human-robot interaction. When multiple objects match the user's verbal description, the robot needs to clarify which object the user is referring to before executing the action. Previous research has focused on perceiving user's multimodal behaviour to complement verbal commands or minimising the number of follow up questions to reduce task time. In this paper, we propose a system for reference disambiguation based on visualisation and compare three methods to disambiguate natural language instructions. In a controlled experiment with a YuMi robot, we investigated real-time augmentations of the workspace in three conditions -- mixed reality, augmented reality, and a monitor as the baseline -- using objective measures such as time and accuracy, and subjective measures like engagement, immersion, and display interference. Significant differences were found in accuracy and engagement between the conditions, but no differences were found in task time. Despite the higher error rates in the mixed reality condition, participants found that modality more engaging than the other two, but overall showed preference for the augmented reality condition over the monitor and mixed reality conditions

arXiv.org e-Print Archive

Crossref

Embodied & Situated Language Processing

Author: Coventry Kenny
Engelhardt Paul
Taylor Lawrence
Publication venue
Publication date: 01/08/2012
Field of study

Northumbria Research Link

Conjunctive Visual and Auditory Development via Real-Time Dialogue

Author: Weng Juyang
Zhang Yilu
Publication venue: Lund University Cognitive Studies
Publication date: 01/01/2003
Field of study

Human developmental learning is capable of dealing with the dynamic visual world, speech-based dialogue, and their complex real-time association. However, the architecture that realizes this for robotic cognitive development has not been reported in the past. This paper takes up this challenge. The proposed architecture does not require a strict coupling between visual and auditory stimuli. Two major operations contribute to the “abstraction” process: multiscale temporal priming and high-dimensional numeric abstraction through internal responses with reduced variance. As a basic principle of developmental learning, the programmer does not know the nature of the world events at the time of programming and, thus, hand-designed task-specific representation is not possible. We successfully tested the architecture on the SAIL robot under an unprecedented challenging multimodal interaction mode: use real-time speech dialogue as a teaching source for simultaneous and incremental visual learning and language acquisition, while the robot is viewing a dynamic world that contains a rotating object to which the dialogue is referring

CogPrints Cognitive Sciences Eprint Archive

Adapting the use of attributes to the task environment in joint action: results and a model

Author: Bard Ellen
Guhe Markus
Publication venue
Publication date: 01/06/2008
Field of study

Edinburgh Research Explorer

Do (and say) as I say: Linguistic adaptation in human-computer dialogs

Author: Bargh J. A.
Bell L.
Bohus D.
Branigan H. P.
Branigan H. P.
Branigan H. P.
Brennan S. E.
Brennan S. E.
Gabsdil M.
Gergle D.
Gravetter F. J.
Healey P. G.
Lazar J.
Levin D. T.
Levinson S. C.
Porzel R.
Reitter D.
Reitter D.
Robert D. Macredie
Sauro J.
Stanislao Lauria
Theodora Koulouri
Publication venue: 'Informa UK Limited'
Publication date: 18/06/2014
Field of study

© Theodora Koulouri, Stanislao Lauria, and Robert D. Macredie. This article has been made available through the Brunel Open Access Publishing Fund.There is strong research evidence showing that people naturally align to each other’s vocabulary, sentence structure, and acoustic features in dialog, yet little is known about how the alignment mechanism operates in the interaction between users and computer systems let alone how it may be exploited to improve the efficiency of the interaction. This article provides an account of lexical alignment in human–computer dialogs, based on empirical data collected in a simulated human–computer interaction scenario. The results indicate that alignment is present, resulting in the gradual reduction and stabilization of the vocabulary-in-use, and that it is also reciprocal. Further, the results suggest that when system and user errors occur, the development of alignment is temporarily disrupted and users tend to introduce novel words to the dialog. The results also indicate that alignment in human–computer interaction may have a strong strategic component and is used as a resource to compensate for less optimal (visually impoverished) interaction conditions. Moreover, lower alignment is associated with less successful interaction, as measured by user perceptions. The article distills the results of the study into design recommendations for human–computer dialog systems and uses them to outline a model of dialog management that supports and exploits alignment through mechanisms for in-use adaptation of the system’s grammar and lexicon

Crossref

Brunel University Research Archive

Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks

Author: Fink G. A.
Fritsch J.
McGuire P. C.
Ritter H.
Roethling F.
Sagerer G.
Steil J. J.
Wachsmuth S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One way of such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable to establish a common focus of attention and be able to use and integrate spoken instructions, visual perceptions, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and an modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.Comment: 7 pages, 8 figure

arXiv.org e-Print Archive

Crossref