4,277 research outputs found
A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction
Picking up objects requested by a human user is a common task in human-robot
interaction. When multiple objects match the user's verbal description, the
robot needs to clarify which object the user is referring to before executing
the action. Previous research has focused on perceiving user's multimodal
behaviour to complement verbal commands or minimising the number of follow up
questions to reduce task time. In this paper, we propose a system for reference
disambiguation based on visualisation and compare three methods to disambiguate
natural language instructions. In a controlled experiment with a YuMi robot, we
investigated real-time augmentations of the workspace in three conditions --
mixed reality, augmented reality, and a monitor as the baseline -- using
objective measures such as time and accuracy, and subjective measures like
engagement, immersion, and display interference. Significant differences were
found in accuracy and engagement between the conditions, but no differences
were found in task time. Despite the higher error rates in the mixed reality
condition, participants found that modality more engaging than the other two,
but overall showed preference for the augmented reality condition over the
monitor and mixed reality conditions
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
An incremental learning framework to enhance teaching by demonstration based on multimodal sensor fusion
Though a robot can reproduce the demonstration trajectory from a human demonstrator by teleoperation, there is a certain error between the reproduced trajectory and the desired trajectory. To minimize this error, we propose a multimodal incremental learning framework based on a teleoperation strategy that can enable the robot to reproduce the demonstration task accurately. The multimodal demonstration data are collected from two different kinds of sensors in the demonstration phase. Then, the Kalman filter (KF) and dynamic time warping (DTW) algorithms are used to preprocessing the data for the multiple sensor signals. The KF algorithm is mainly used to fuse sensor data of different modalities, and the DTW algorithm is used to align the data in the same timeline. The preprocessed demonstration data are further trained and learned by the incremental learning network and sent to a Baxter robot for reproducing the task demonstrated by the human. Comparative experiments have been performed to verify the effectiveness of the proposed framework
MirBot: A collaborative object recognition system for smartphones using convolutional neural networks
MirBot is a collaborative application for smartphones that allows users to
perform object recognition. This app can be used to take a photograph of an
object, select the region of interest and obtain the most likely class (dog,
chair, etc.) by means of similarity search using features extracted from a
convolutional neural network (CNN). The answers provided by the system can be
validated by the user so as to improve the results for future queries. All the
images are stored together with a series of metadata, thus enabling a
multimodal incremental dataset labeled with synset identifiers from the WordNet
ontology. This dataset grows continuously thanks to the users' feedback, and is
publicly available for research. This work details the MirBot object
recognition system, analyzes the statistics gathered after more than four years
of usage, describes the image classification methodology, and performs an
exhaustive evaluation using handcrafted features, convolutional neural codes
and different transfer learning techniques. After comparing various models and
transformation methods, the results show that the CNN features maintain the
accuracy of MirBot constant over time, despite the increasing number of new
classes. The app is freely available at the Apple and Google Play stores.Comment: Accepted in Neurocomputing, 201
Incremental Unit Networks for Distributed, Symbolic Multimodal Processing and Representation
Incremental dialogue processing has been an important topic in spoken dialogue systems research, but the broader research community that makes use of language interaction (e.g., chatbots, conversational AI, spoken interaction with robots) have not adopted incremental processing despite research showing that humans perceive incremental dialogue as more natural. In this paper, we extend prior work that identifies the requirements for making spoken interaction with a system natural with the goal that our framework will be generalizable to many domains where speech is the primary method of communication. The Incremental Unit framework offers a model of incremental processing that has been extended to be multimodal, temporally aligned, enables real-time information updates, and creates complex network of information as a fine-grained information state. One challenge is that multimodal dialogue systems often have computationally expensive modules, requiring computation to be distributive. Most importantly, when speech is the means of communication, it brings the added expectation that systems understand what they (humans) say, but also that systems understand and respond without delay. In this paper, we build on top of the Incremental Unit framework and make it amenable to a distributive architecture made up of a robot and spoken dialogue system modules. To enable fast communication between the modules and to maintain module state histories, we compared two different implementations of a distributed Incremental Unit architecture. We compare both implementations systematically then with real human users and show that the implementation that uses an external attribute-value database is preferred, but there is some flexibility in which variant to use depending on the circumstances. This work offers the Incremental Unit framework as an architecture for building powerful, complete, and natural dialogue systems, specifically applicable to robots and multimodal systems researchers
Developmental Robots - A New Paradigm
It has been proved to be extremely challenging for humans to program a robot to such a sufficient degree that it acts properly in a typical unknown human environment. This is especially true for a humanoid robot due to the very large number of redundant degrees of freedom and a large number of sensors that are required for a humanoid to work safely and effectively in the human environment. How can we address this fundamental problem? Motivated by human mental development from infancy to adulthood, we present a theory, an architecture, and some experimental results showing how to enable a robot to develop its mind automatically, through online, real time interactions with its environment. Humans mentally “raise” the robot through “robot sitting” and “robot schools” instead of task-specific robot programming
Temporal Alignment Using the Incremental Unit Framework
We propose a method for temporal alignments--a precondition of meaningful fusions--of multimodal systems, using the incremental unit dialogue system framework, which gives the system flexibility in how it handles alignment: either by delaying a modality for a specified amount of time, or by revoking (i.e., backtracking) processed information so multiple information sources can be processed jointly. We evaluate our approach in an offline experiment with multimodal data and find that using the incremental framework is flexible and shows promise as a solution to the problem of temporal alignment in multimodal systems
- …