4,950 research outputs found
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Constraining the Size Growth of the Task Space with Socially Guided Intrinsic Motivation using Demonstrations
This paper presents an algorithm for learning a highly redundant inverse
model in continuous and non-preset environments. Our Socially Guided Intrinsic
Motivation by Demonstrations (SGIM-D) algorithm combines the advantages of both
social learning and intrinsic motivation, to specialise in a wide range of
skills, while lessening its dependence on the teacher. SGIM-D is evaluated on a
fishing skill learning experiment.Comment: JCAI Workshop on Agents Learning Interactively from Human Teachers
(ALIHT), Barcelona : Spain (2011
Learning Dynamic Robot-to-Human Object Handover from Human Feedback
Object handover is a basic, but essential capability for robots interacting
with humans in many applications, e.g., caring for the elderly and assisting
workers in manufacturing workshops. It appears deceptively simple, as humans
perform object handover almost flawlessly. The success of humans, however,
belies the complexity of object handover as collaborative physical interaction
between two agents with limited communication. This paper presents a learning
algorithm for dynamic object handover, for example, when a robot hands over
water bottles to marathon runners passing by the water station. We formulate
the problem as contextual policy search, in which the robot learns object
handover by interacting with the human. A key challenge here is to learn the
latent reward of the handover task under noisy human feedback. Preliminary
experiments show that the robot learns to hand over a water bottle naturally
and that it adapts to the dynamics of human motion. One challenge for the
future is to combine the model-free learning algorithm with a model-based
planning approach and enable the robot to adapt over human preferences and
object characteristics, such as shape, weight, and surface texture.Comment: Appears in the Proceedings of the International Symposium on Robotics
Research (ISRR) 201
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide
an abundant source of data and alleviate certain safety concerns during the
training process. But the behaviours developed by agents in simulation are
often specific to the characteristics of the simulator. Due to modeling error,
strategies that are successful in simulation may not transfer to their real
world counterparts. In this paper, we demonstrate a simple method to bridge
this "reality gap". By randomizing the dynamics of the simulator during
training, we are able to develop policies that are capable of adapting to very
different dynamics, including ones that differ significantly from the dynamics
on which the policies were trained. This adaptivity enables the policies to
generalize to the dynamics of the real world without any training on the
physical system. Our approach is demonstrated on an object pushing task using a
robotic arm. Despite being trained exclusively in simulation, our policies are
able to maintain a similar level of performance when deployed on a real robot,
reliably moving an object to a desired location from random initial
configurations. We explore the impact of various design decisions and show that
the resulting policies are robust to significant calibration error
Intrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks
Autonomous robots need to interact with unknown, unstructured and changing
environments, constantly facing novel challenges. Therefore, continuous online
adaptation for lifelong-learning and the need of sample-efficient mechanisms to
adapt to changes in the environment, the constraints, the tasks, or the robot
itself are crucial. In this work, we propose a novel framework for
probabilistic online motion planning with online adaptation based on a
bio-inspired stochastic recurrent neural network. By using learning signals
which mimic the intrinsic motivation signalcognitive dissonance in addition
with a mental replay strategy to intensify experiences, the stochastic
recurrent network can learn from few physical interactions and adapts to novel
environments in seconds. We evaluate our online planning and adaptation
framework on an anthropomorphic KUKA LWR arm. The rapid online adaptation is
shown by learning unknown workspace constraints sample-efficiently from few
physical interactions while following given way points.Comment: accepted in Neural Network
- …