16,432 research outputs found
Learning Contact-Rich Manipulation Skills with Guided Policy Search
Autonomous learning of object manipulation skills can enable robots to
acquire rich behavioral repertoires that scale to the variety of objects found
in the real world. However, current motion skill learning methods typically
restrict the behavior to a compact, low-dimensional representation, limiting
its expressiveness and generality. In this paper, we extend a recently
developed policy search method \cite{la-lnnpg-14} and use it to learn a range
of dynamic manipulation behaviors with highly general policy representations,
without using known models or example demonstrations. Our approach learns a set
of trajectories for the desired motion skill by using iteratively refitted
time-varying linear models, and then unifies these trajectories into a single
control policy that can generalize to new situations. To enable this method to
run on a real robot, we introduce several improvements that reduce the sample
count and automate parameter selection. We show that our method can acquire
fast, fluent behaviors after only minutes of interaction time, and can learn
robust controllers for complex tasks, including putting together a toy
airplane, stacking tight-fitting lego blocks, placing wooden rings onto
tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps
onto bottles
Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
Reinforcement learning (RL) algorithms for real-world robotic applications
need a data-efficient learning process and the ability to handle complex,
unknown dynamical systems. These requirements are handled well by model-based
and model-free RL approaches, respectively. In this work, we aim to combine the
advantages of these two types of methods in a principled manner. By focusing on
time-varying linear-Gaussian policies, we enable a model-based algorithm based
on the linear quadratic regulator (LQR) that can be integrated into the
model-free framework of path integral policy improvement (PI2). We can further
combine our method with guided policy search (GPS) to train arbitrary
parameterized policies such as deep neural networks. Our simulation and
real-world experiments demonstrate that this method can solve challenging
manipulation tasks with comparable or better performance than model-free
methods while maintaining the sample efficiency of model-based methods. A video
presenting our results is available at
https://sites.google.com/site/icml17pilqrComment: Paper accepted to the International Conference on Machine Learning
(ICML) 201
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.Comment: Submitted to the IEEE International Conference on Robotics and
Automation 201
Deep Predictive Policy Training using Reinforcement Learning
Skilled robot task learning is best implemented by predictive action policies
due to the inherent latency of sensorimotor processes. However, training such
predictive policies is challenging as it involves finding a trajectory of motor
activations for the full duration of the action. We propose a data-efficient
deep predictive policy training (DPPT) framework with a deep neural network
policy architecture which maps an image observation to a sequence of motor
activations. The architecture consists of three sub-networks referred to as the
perception, policy and behavior super-layers. The perception and behavior
super-layers force an abstraction of visual and motor data trained with
synthetic and simulated training samples, respectively. The policy super-layer
is a small sub-network with fewer parameters that maps data in-between the
abstracted manifolds. It is trained for each task using methods for policy
search reinforcement learning. We demonstrate the suitability of the proposed
architecture and learning framework by training predictive policies for skilled
object grasping and ball throwing on a PR2 robot. The effectiveness of the
method is illustrated by the fact that these tasks are trained using only about
180 real robot attempts with qualitative terminal rewards.Comment: This work is submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems 2017 (IROS2017
- …