6,534 research outputs found
Grounding Symbols in Multi-Modal Instructions
As robots begin to cohabit with humans in semi-structured environments, the
need arises to understand instructions involving rich variability---for
instance, learning to ground symbols in the physical world. Realistically, this
task must cope with small datasets consisting of a particular users' contextual
assignment of meaning to terms. We present a method for processing a raw stream
of cross-modal input---i.e., linguistic instructions, visual perception of a
scene and a concurrent trace of 3D eye tracking fixations---to produce the
segmentation of objects with a correspondent association to high-level
concepts. To test our framework we present experiments in a table-top object
manipulation scenario. Our results show our model learns the user's notion of
colour and shape from a small number of physical demonstrations, generalising
to identifying physical referents for novel combinations of the words.Comment: 9 pages, 8 figures, To appear in the Proceedings of the ACL workshop
Language Grounding for Robotics, Vancouver, Canad
The Mechanics of Embodiment: A Dialogue on Embodiment and Computational Modeling
Embodied theories are increasingly challenging traditional views of cognition by arguing that conceptual representations that constitute our knowledge are grounded in sensory and motor experiences, and processed at this sensorimotor level, rather than being represented and processed abstractly in an amodal conceptual system. Given the established empirical foundation, and the relatively underspecified theories to date, many researchers are extremely interested in embodied cognition but are clamouring for more mechanistic implementations. What is needed at this stage is a push toward explicit computational models that implement sensory-motor grounding as intrinsic to cognitive processes. In this article, six authors from varying backgrounds and approaches address issues concerning the construction of embodied computational models, and illustrate what they view as the critical current and next steps toward mechanistic theories of embodiment. The first part has the form of a dialogue between two fictional characters: Ernest, the �experimenter�, and Mary, the �computational modeller�. The dialogue consists of an interactive sequence of questions, requests for clarification, challenges, and (tentative) answers, and touches the most important aspects of grounded theories that should inform computational modeling and, conversely, the impact that computational modeling could have on embodied theories. The second part of the article discusses the most important open challenges for embodied computational modelling
A Review of Verbal and Non-Verbal Human-Robot Interactive Communication
In this paper, an overview of human-robot interactive communication is
presented, covering verbal as well as non-verbal aspects of human-robot
interaction. Following a historical introduction, and motivation towards fluid
human-robot communication, ten desiderata are proposed, which provide an
organizational axis both of recent as well as of future research on human-robot
communication. Then, the ten desiderata are examined in detail, culminating to
a unifying discussion, and a forward-looking conclusion
Learning structured task related abstractions
As robots and autonomous agents are to assist people with more tasks in various
domains they need the ability to quickly gain contextual awareness in unseen environments
and learn new tasks. Current state of the art methods rely predominantly
on statistical learning techniques which tend to overfit to sensory signals and often
fail to extract structured task related abstractions. The obtained environment and task
models are typically represented as black box objects that cannot be easily updated or
inspected and provide limited generalisation capabilities.
We address the aforementioned shortcomings of current methods by explicitly
studying the problem of learning structured task related abstractions. In particular, we
are interested in extracting symbolic representations of the environment from sensory
signals and encoding the task to be executed as a computer program. We consider the
standard problem of learning to solve a task by mapping sensory signals to actions
and propose the decomposition of such a mapping into two stages: i) perceiving
symbols from sensory data and ii) using a program to manipulate those symbols in
order to make decisions. This thesis studies the bidirectional interactions between the
agent’s capabilities to perceive symbols and the programs it can execute in order to
solve a task.
In the first part of the thesis we demonstrate that access to a programmatic
description of the task provides a strong inductive bias which facilitates the learning
of structured task related representations of the environment. In order to do so, we first
consider a collaborative human-robot interaction setup and propose a framework for
Grounding and Learning Instances through Demonstration and Eye tracking (GLIDE)
which enables robots to learn symbolic representations of the environment from few
demonstrations. In order to relax the constraints on the task encoding program which
GLIDE assumes, we introduce the perceptor gradients algorithm and prove that it can
be applied with any task encoding program.
In the second part of the thesis we investigate the complement problem of inducing
task encoding programs assuming that a symbolic representations of the
environment is available. Therefore, we propose the p-machine – a novel program
induction framework which combines standard enumerative search techniques with a
stochastic gradient descent optimiser in order to obtain an efficient program synthesiser.
We show that the induction of task encoding programs is applicable to various
problems such as learning physics laws, inspecting neural networks and learning in
human-robot interaction setups
A Data-driven Approach Towards Human-robot Collaborative Problem Solving in a Shared Space
We are developing a system for human-robot communication that enables people
to communicate with robots in a natural way and is focused on solving problems
in a shared space. Our strategy for developing this system is fundamentally
data-driven: we use data from multiple input sources and train key components
with various machine learning techniques. We developed a web application that
is collecting data on how two humans communicate to accomplish a task, as well
as a mobile laboratory that is instrumented to collect data on how two humans
communicate to accomplish a task in a physically shared space. The data from
these systems will be used to train and fine-tune the second stage of our
system, in which the robot will be simulated through software. A physical robot
will be used in the final stage of our project. We describe these instruments,
a test-suite and performance metrics designed to evaluate and automate the data
gathering process as well as evaluate an initial data set.Comment: 2017 AAAI Fall Symposium on Natural Communication for Human-Robot
Collaboratio
From explanation to synthesis: Compositional program induction for learning from demonstration
Hybrid systems are a compact and natural mechanism with which to address
problems in robotics. This work introduces an approach to learning hybrid
systems from demonstrations, with an emphasis on extracting models that are
explicitly verifiable and easily interpreted by robot operators. We fit a
sequence of controllers using sequential importance sampling under a generative
switching proportional controller task model. Here, we parameterise controllers
using a proportional gain and a visually verifiable joint angle goal. Inference
under this model is challenging, but we address this by introducing an
attribution prior extracted from a neural end-to-end visuomotor control model.
Given the sequence of controllers comprising a task, we simplify the trace
using grammar parsing strategies, taking advantage of the sequence
compositionality, before grounding the controllers by training perception
networks to predict goals given images. Using this approach, we are
successfully able to induce a program for a visuomotor reaching task involving
loops and conditionals from a single demonstration and a neural end-to-end
model. In addition, we are able to discover the program used for a tower
building task. We argue that computer program-like control systems are more
interpretable than alternative end-to-end learning approaches, and that hybrid
systems inherently allow for better generalisation across task configurations
Interpretable Latent Spaces for Learning from Demonstration
Effective human-robot interaction, such as in robot learning from human
demonstration, requires the learning agent to be able to ground abstract
concepts (such as those contained within instructions) in a corresponding
high-dimensional sensory input stream from the world. Models such as deep
neural networks, with high capacity through their large parameter spaces, can
be used to compress the high-dimensional sensory data to lower dimensional
representations. These low-dimensional representations facilitate symbol
grounding, but may not guarantee that the representation would be
human-interpretable. We propose a method which utilises the grouping of
user-defined symbols and their corresponding sensory observations in order to
align the learnt compressed latent representation with the semantic notions
contained in the abstract labels. We demonstrate this through experiments with
both simulated and real-world object data, showing that such alignment can be
achieved in a process of physical symbol grounding.Comment: 12 pages, 6 figures, accepted at the Conference on Robot Learning
(CoRL) 2018, Zurich, Switzerlan
- …