949 research outputs found
Asymmetric Actor Critic for Image-Based Robot Learning
Deep reinforcement learning (RL) has proven a powerful technique in many
sequential decision making domains. However, Robotics poses many challenges for
RL, most notably training on a physical system can be expensive and dangerous,
which has sparked significant interest in learning control policies using a
physics simulator. While several recent works have shown promising results in
transferring policies trained in simulation to the real world, they often do
not fully utilize the advantage of working with a simulator. In this work, we
exploit the full state observability in the simulator to train better policies
which take as input only partial observations (RGBD images). We do this by
employing an actor-critic training algorithm in which the critic is trained on
full states while the actor (or policy) gets rendered images as input. We show
experimentally on a range of simulated tasks that using these asymmetric inputs
significantly improves performance. Finally, we combine this method with domain
randomization and show real robot experiments for several tasks like picking,
pushing, and moving a block. We achieve this simulation to real world transfer
without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT
Attention-Privileged Reinforcement Learning
Image-based Reinforcement Learning is known to suffer from poor sample
efficiency and generalisation to unseen visuals such as distractors
(task-independent aspects of the observation space). Visual domain
randomisation encourages transfer by training over visual factors of variation
that may be encountered in the target domain. This increases learning
complexity, can negatively impact learning rate and performance, and requires
knowledge of potential variations during deployment. In this paper, we
introduce Attention-Privileged Reinforcement Learning (APRiL) which uses a
self-supervised attention mechanism to significantly alleviate these drawbacks:
by focusing on task-relevant aspects of the observations, attention provides
robustness to distractors as well as significantly increased learning
efficiency. APRiL trains two attention-augmented actor-critic agents: one
purely based on image observations, available across training and transfer
domains; and one with access to privileged information (such as environment
states) available only during training. Experience is shared between both
agents and their attention mechanisms are aligned. The image-based policy can
then be deployed without access to privileged information. We experimentally
demonstrate accelerated and more robust learning on a diverse set of domains,
leading to improved final performance for environments both within and outside
the training distribution.Comment: Published at Conference on Robot Learning (CoRL) 202
- …