19,990 research outputs found
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
Black-Box Data-efficient Policy Search for Robotics
International audienceThe most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynam-ical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot)
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
Fast Model Identification via Physics Engines for Data-Efficient Policy Search
This paper presents a method for identifying mechanical parameters of robots
or objects, such as their mass and friction coefficients. Key features are the
use of off-the-shelf physics engines and the adaptation of a Bayesian
optimization technique towards minimizing the number of real-world experiments
needed for model-based reinforcement learning. The proposed framework
reproduces in a physics engine experiments performed on a real robot and
optimizes the model's mechanical parameters so as to match real-world
trajectories. The optimized model is then used for learning a policy in
simulation, before real-world deployment. It is well understood, however, that
it is hard to exactly reproduce real trajectories in simulation. Moreover, a
near-optimal policy can be frequently found with an imperfect model. Therefore,
this work proposes a strategy for identifying a model that is just good enough
to approximate the value of a locally optimal policy with a certain confidence,
instead of wasting effort on identifying the most accurate model. Evaluations,
performed both in simulation and on a real robotic manipulation task, indicate
that the proposed strategy results in an overall time-efficient, integrated
model identification and learning solution, which significantly improves the
data-efficiency of existing policy search algorithms.Comment: IJCAI 1
Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
One of the most interesting features of Bayesian optimization for direct
policy search is that it can leverage priors (e.g., from simulation or from
previous tasks) to accelerate learning on a robot. In this paper, we are
interested in situations for which several priors exist but we do not know in
advance which one fits best the current situation. We tackle this problem by
introducing a novel acquisition function, called Most Likely Expected
Improvement (MLEI), that combines the likelihood of the priors and the expected
improvement. We evaluate this new acquisition function on a transfer learning
task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has
to learn to walk on flat ground and on stairs, with priors corresponding to
different stairs and different kinds of damages. Our results show that MLEI
effectively identifies and exploits the priors, even when there is no obvious
match between the current situations and the priors.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 1 algorithm; Video at
https://youtu.be/xo8mUIZTvNE ; Spotlight ICRA presentation
https://youtu.be/iiVaV-U6Kq
Bayesian Optimization Using Domain Knowledge on the ATRIAS Biped
Controllers in robotics often consist of expert-designed heuristics, which
can be hard to tune in higher dimensions. It is typical to use simulation to
learn these parameters, but controllers learned in simulation often don't
transfer to hardware. This necessitates optimization directly on hardware.
However, collecting data on hardware can be expensive. This has led to a recent
interest in adapting data-efficient learning techniques to robotics. One
popular method is Bayesian Optimization (BO), a sample-efficient black-box
optimization scheme, but its performance typically degrades in higher
dimensions. We aim to overcome this problem by incorporating domain knowledge
to reduce dimensionality in a meaningful way, with a focus on bipedal
locomotion. In previous work, we proposed a transformation based on knowledge
of human walking that projected a 16-dimensional controller to a 1-dimensional
space. In simulation, this showed enhanced sample efficiency when optimizing
human-inspired neuromuscular walking controllers on a humanoid model. In this
paper, we present a generalized feature transform applicable to non-humanoid
robot morphologies and evaluate it on the ATRIAS bipedal robot -- in simulation
and on hardware. We present three different walking controllers; two are
evaluated on the real robot. Our results show that this feature transform
captures important aspects of walking and accelerates learning on hardware and
simulation, as compared to traditional BO.Comment: 8 pages, submitted to IEEE International Conference on Robotics and
Automation 201
- …