5,838 research outputs found
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
Multi-Robot Transfer Learning: A Dynamical System Perspective
Multi-robot transfer learning allows a robot to use data generated by a
second, similar robot to improve its own behavior. The potential advantages are
reducing the time of training and the unavoidable risks that exist during the
training phase. Transfer learning algorithms aim to find an optimal transfer
map between different robots. In this paper, we investigate, through a
theoretical study of single-input single-output (SISO) systems, the properties
of such optimal transfer maps. We first show that the optimal transfer learning
map is, in general, a dynamic system. The main contribution of the paper is to
provide an algorithm for determining the properties of this optimal dynamic map
including its order and regressors (i.e., the variables it depends on). The
proposed algorithm does not require detailed knowledge of the robots' dynamics,
but relies on basic system properties easily obtainable through simple
experimental tests. We validate the proposed algorithm experimentally through
an example of transfer learning between two different quadrotor platforms.
Experimental results show that an optimal dynamic map, with correct properties
obtained from our proposed algorithm, achieves 60-70% reduction of transfer
learning error compared to the cases when the data is directly transferred or
transferred using an optimal static map.Comment: 7 pages, 6 figures, accepted at the 2017 IEEE/RSJ International
Conference on Intelligent Robots and System
- …