4,561 research outputs found
Socially Compliant Navigation through Raw Depth Inputs with Generative Adversarial Imitation Learning
We present an approach for mobile robots to learn to navigate in dynamic
environments with pedestrians via raw depth inputs, in a socially compliant
manner. To achieve this, we adopt a generative adversarial imitation learning
(GAIL) strategy, which improves upon a pre-trained behavior cloning policy. Our
approach overcomes the disadvantages of previous methods, as they heavily
depend on the full knowledge of the location and velocity information of nearby
pedestrians, which not only requires specific sensors, but also the extraction
of such state information from raw sensory input could consume much computation
time. In this paper, our proposed GAIL-based model performs directly on raw
depth inputs and plans in real-time. Experiments show that our GAIL-based
approach greatly improves the safety and efficiency of the behavior of mobile
robots from pure behavior cloning. The real-world deployment also shows that
our method is capable of guiding autonomous vehicles to navigate in a socially
compliant manner directly through raw depth inputs. In addition, we release a
simulation plugin for modeling pedestrian behaviors based on the social force
model.Comment: ICRA 2018 camera-ready version. 7 pages, video link:
https://www.youtube.com/watch?v=0hw0GD3lkA
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
- …