1,978 research outputs found
End-to-End Training of Deep Visuomotor Policies
Policy search methods can allow robots to learn control policies for a wide
range of tasks, but practical applications of policy search often require
hand-engineered components for perception, state estimation, and low-level
control. In this paper, we aim to answer the following question: does training
the perception and control systems jointly end-to-end provide better
performance than training each component separately? To this end, we develop a
method that can be used to learn policies that map raw image observations
directly to torques at the robot's motors. The policies are represented by deep
convolutional neural networks (CNNs) with 92,000 parameters, and are trained
using a partially observed guided policy search method, which transforms policy
search into supervised learning, with supervision provided by a simple
trajectory-centric reinforcement learning method. We evaluate our method on a
range of real-world manipulation tasks that require close coordination between
vision and control, such as screwing a cap onto a bottle, and present simulated
comparisons to a range of prior policy search methods.Comment: updating with revisions for JMLR final versio
Adversarial Feature Training for Generalizable Robotic Visuomotor Control
Deep reinforcement learning (RL) has enabled training action-selection
policies, end-to-end, by learning a function which maps image pixels to action
outputs. However, it's application to visuomotor robotic policy training has
been limited because of the challenge of large-scale data collection when
working with physical hardware. A suitable visuomotor policy should perform
well not just for the task-setup it has been trained for, but also for all
varieties of the task, including novel objects at different viewpoints
surrounded by task-irrelevant objects. However, it is impractical for a robotic
setup to sufficiently collect interactive samples in a RL framework to
generalize well to novel aspects of a task. In this work, we demonstrate that
by using adversarial training for domain transfer, it is possible to train
visuomotor policies based on RL frameworks, and then transfer the acquired
policy to other novel task domains. We propose to leverage the deep RL
capabilities to learn complex visuomotor skills for uncomplicated task setups,
and then exploit transfer learning to generalize to new task domains provided
only still images of the task in the target domain. We evaluate our method on
two real robotic tasks, picking and pouring, and compare it to a number of
prior works, demonstrating its superiority
Accept Synthetic Objects as Real: End-to-End Training of Attentive Deep Visuomotor Policies for Manipulation in Clutter
Recent research demonstrated that it is feasible to end-to-end train
multi-task deep visuomotor policies for robotic manipulation using variations
of learning from demonstration (LfD) and reinforcement learning (RL). In this
paper, we extend the capabilities of end-to-end LfD architectures to object
manipulation in clutter. We start by introducing a data augmentation procedure
called Accept Synthetic Objects as Real (ASOR). Using ASOR we develop two
network architectures: implicit attention ASOR-IA and explicit attention
ASOR-EA. Both architectures use the same training data (demonstrations in
uncluttered environments) as previous approaches. Experimental results show
that ASOR-IA and ASOR-EA succeed ina significant fraction of trials in
cluttered environments where previous approaches never succeed. In addition, we
find that both ASOR-IA and ASOR-EA outperform previous approaches even in
uncluttered environments, with ASOR-EA performing better even in clutter
compared to the previous best baseline in an uncluttered environment.Comment: 6 pages, 5 figure
Data-efficient visuomotor policy training using reinforcement learning and generative models
We present a data-efficient framework for solving visuomotor sequential
decision-making problems which exploits the combination of reinforcement
learning (RL) and latent variable generative models. Our framework trains deep
visuomotor policies by introducing an action latent variable such that the
feed-forward policy search can be divided into three parts: (i) training a
sub-policy that outputs a distribution over the action latent variable given a
state of the system, (ii) unsupervised training of a generative model that
outputs a sequence of motor actions conditioned on the latent action variable,
and (iii) supervised training of the deep visuomotor policy in an end-to-end
fashion. Our approach enables safe exploration and alleviates the
data-inefficiency problem as it exploits prior knowledge about valid sequences
of motor actions. Moreover, we provide a set of measures for evaluation of
generative models such that we are able to predict the performance of the RL
policy training prior to the actual training on a physical robot. We define two
novel measures of disentanglement and local linearity for assessing the quality
of latent representations, and complement them with existing measures for
assessment of the learned distribution. We experimentally determine the
characteristics of different generative models that have the most influence on
performance of the final policy training on a robotic picking task
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
We propose a model-free deep reinforcement learning method that leverages a
small amount of demonstration data to assist a reinforcement learning agent. We
apply this approach to robotic manipulation tasks and train end-to-end
visuomotor policies that map directly from RGB camera inputs to joint
velocities. We demonstrate that our approach can solve a wide variety of
visuomotor tasks, for which engineering a scripted controller would be
laborious. In experiments, our reinforcement and imitation agent achieves
significantly better performances than agents trained with reinforcement
learning or imitation learning alone. We also illustrate that these policies,
trained with large visual and dynamics variations, can achieve preliminary
successes in zero-shot sim2real transfer. A brief visual description of this
work can be viewed in https://youtu.be/EDl8SQUNjj0Comment: 13 pages, 6 figures, Published in RSS 201
Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention
Several recent studies have demonstrated the promise of deep visuomotor
policies for robot manipulator control. Despite impressive progress, these
systems are known to be vulnerable to physical disturbances, such as accidental
or adversarial bumps that make them drop the manipulated object. They also tend
to be distracted by visual disturbances such as objects moving in the robot's
field of view, even if the disturbance does not physically prevent the
execution of the task. In this paper, we propose an approach for augmenting a
deep visuomotor policy trained through demonstrations with Task Focused visual
Attention (TFA). The manipulation task is specified with a natural language
text such as `move the red bowl to the left'. This allows the visual attention
component to concentrate on the current object that the robot needs to
manipulate. We show that even in benign environments, the TFA allows the policy
to consistently outperform a variant with no attention mechanism. More
importantly, the new policy is significantly more robust: it regularly recovers
from severe physical disturbances (such as bumps causing it to drop the object)
from which the baseline policy, i.e. with no visual attention, almost never
recovers. In addition, we show that the proposed policy performs correctly in
the presence of a wide class of visual disturbances, exhibiting a behavior
reminiscent of human selective visual attention experiments. Our proposed
approach consists of a VAE-GAN network which encodes the visual input and feeds
it to a Motor network that moves the robot joints. Also, our approach benefits
from a teacher network for the TFA that leverages textual input command to
robustify the visual encoder against various types of disturbances
Universal Planning Networks
A key challenge in complex visuomotor control is learning abstract
representations that are effective for specifying goals, planning, and
generalization. To this end, we introduce universal planning networks (UPN).
UPNs embed differentiable planning within a goal-directed policy. This planning
computation unrolls a forward model in a latent space and infers an optimal
action plan through gradient descent trajectory optimization. The
plan-by-gradient-descent process and its underlying representations are learned
end-to-end to directly optimize a supervised imitation learning objective. We
find that the representations learned are not only effective for goal-directed
visual imitation via gradient-based trajectory optimization, but can also
provide a metric for specifying goals using images. The learned representations
can be leveraged to specify distance-based rewards to reach new target states
for model-free reinforcement learning, resulting in substantially more
effective learning when solving new tasks described via image-based goals. We
were able to achieve successful transfer of visuomotor planning strategies
across robots with significantly different morphologies and actuation
capabilities.Comment: Videos available at https://sites.google.com/view/upn-public/hom
Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning
In this paper, we present a new intrinsically motivated actor-critic
algorithm for learning continuous motor skills directly from raw visual input.
Our neural architecture is composed of a critic and an actor network. Both
networks receive the hidden representation of a deep convolutional autoencoder
which is trained to reconstruct the visual input, while the centre-most hidden
representation is also optimized to estimate the state value. Separately, an
ensemble of predictive world models generates, based on its learning progress,
an intrinsic reward signal which is combined with the extrinsic reward to guide
the exploration of the actor-critic learner. Our approach is more
data-efficient and inherently more stable than the existing actor-critic
methods for continuous control from pixel data. We evaluate our algorithm for
the task of learning robotic reaching and grasping skills on a realistic
physics simulator and on a humanoid robot. The results show that the control
policies learned with our approach can achieve better performance than the
compared state-of-the-art and baseline algorithms in both dense-reward and
challenging sparse-reward settings
A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies
Learning effective visuomotor policies for robots purely from data is
challenging, but also appealing since a learning-based system should not
require manual tuning or calibration. In the case of a robot operating in a
real environment the training process can be costly, time-consuming, and even
dangerous since failures are common at the start of training. For this reason,
it is desirable to be able to leverage \textit{simulation} and
\textit{off-policy} data to the extent possible to train the robot. In this
work, we introduce a robust framework that plans in simulation and transfers
well to the real environment. Our model incorporates a gradient-descent based
planning module, which, given the initial image and goal image, encodes the
images to a lower dimensional latent state and plans a trajectory to reach the
goal. The model, consisting of the encoder and planner modules, is trained
through a meta-learning strategy in simulation first. We subsequently perform
adversarial domain transfer on the encoder by using a bank of unlabelled but
random images from the simulation and real environments to enable the encoder
to map images from the real and simulated environments to a similarly
distributed latent representation. By fine tuning the entire model (encoder +
planner) with far fewer real world expert demonstrations, we show successful
planning performances in different navigation tasks.Comment: Under review in ICRA 201
Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
How much does having visual priors about the world (e.g. the fact that the
world is 3D) assist in learning to perform downstream motor tasks (e.g.
delivering a package)? We study this question by integrating a generic
perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within
a reinforcement learning framework--see Figure 1. This skill set (hereafter
mid-level perception) provides the policy with a more processed state of the
world compared to raw images.
We find that using a mid-level perception confers significant advantages over
training end-to-end from scratch (i.e. not leveraging priors) in
navigation-oriented tasks. Agents are able to generalize to situations where
the from-scratch approach fails and training becomes significantly more sample
efficient. However, we show that realizing these gains requires careful
selection of the mid-level perceptual skills. Therefore, we refine our findings
into an efficient max-coverage feature set that can be adopted in lieu of raw
images. We perform our study in completely separate buildings for training and
testing and compare against visually blind baseline policies and
state-of-the-art feature learning methods.Comment: See project website, demos, and code at http://perceptual.acto
- …