1,087 research outputs found
Empirical Evaluation of Contextual Policy Search with a Comparison-based Surrogate Model and Active Covariance Matrix Adaptation
Contextual policy search (CPS) is a class of multi-task reinforcement
learning algorithms that is particularly useful for robotic applications. A
recent state-of-the-art method is Contextual Covariance Matrix Adaptation
Evolution Strategies (C-CMA-ES). It is based on the standard black-box
optimization algorithm CMA-ES. There are two useful extensions of CMA-ES that
we will transfer to C-CMA-ES and evaluate empirically: ACM-ES, which uses a
comparison-based surrogate model, and aCMA-ES, which uses an active update of
the covariance matrix. We will show that improvements with these methods can be
impressive in terms of sample-efficiency, although this is not relevant any
more for the robotic domain.Comment: Supplementary material for poster paper accepted at GECCO 2019;
https://doi.org/10.1145/3319619.332193
Better Optimism By Bayes: Adaptive Planning with Rich Models
The computational costs of inference and planning have confined Bayesian
model-based reinforcement learning to one of two dismal fates: powerful
Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian
non-parametric models but using simple, myopic planning strategies such as
Thompson sampling. We ask whether it is feasible and truly beneficial to
combine rich probabilistic models with a closer approximation to fully Bayesian
planning. First, we use a collection of counterexamples to show formal problems
with the over-optimism inherent in Thompson sampling. Then we leverage
state-of-the-art techniques in efficient Bayes-adaptive planning and
non-parametric Bayesian methods to perform qualitatively better than both
existing conventional algorithms and Thompson sampling on two contextual
bandit-like problems.Comment: 11 pages, 11 figure
A survey on policy search algorithms for learning robot controllers in a handful of trials
Most policy search algorithms require thousands of training episodes to find
an effective policy, which is often infeasible with a physical robot. This
survey article focuses on the extreme other end of the spectrum: how can a
robot adapt with only a handful of trials (a dozen) and a few minutes? By
analogy with the word "big-data", we refer to this challenge as "micro-data
reinforcement learning". We show that a first strategy is to leverage prior
knowledge on the policy structure (e.g., dynamic movement primitives), on the
policy parameters (e.g., demonstrations), or on the dynamics (e.g.,
simulators). A second strategy is to create data-driven surrogate models of the
expected reward (e.g., Bayesian optimization) or the dynamical model (e.g.,
model-based policy search), so that the policy optimizer queries the model
instead of the real system. Overall, all successful micro-data algorithms
combine these two strategies by varying the kind of model and prior knowledge.
The current scientific challenges essentially revolve around scaling up to
complex robots (e.g., humanoids), designing generic priors, and optimizing the
computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on
Robotic
Active model learning and diverse action sampling for task and motion planning
The objective of this work is to augment the basic abilities of a robot by
learning to use new sensorimotor primitives to enable the solution of complex
long-horizon problems. Solving long-horizon problems in complex domains
requires flexible generative planning that can combine primitive abilities in
novel combinations to solve problems as they arise in the world. In order to
plan to combine primitive actions, we must have models of the preconditions and
effects of those actions: under what circumstances will executing this
primitive achieve some particular effect in the world?
We use, and develop novel improvements on, state-of-the-art methods for
active learning and sampling. We use Gaussian process methods for learning the
conditions of operator effectiveness from small numbers of expensive training
examples collected by experimentation on a robot. We develop adaptive sampling
methods for generating diverse elements of continuous sets (such as robot
configurations and object poses) during planning for solving a new task, so
that planning is as efficient as possible. We demonstrate these methods in an
integrated system, combining newly learned models with an efficient
continuous-space robot task and motion planner to learn to solve long horizon
problems more efficiently than was previously possible.Comment: Proceedings of the 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), Madrid, Spain.
https://www.youtube.com/playlist?list=PLoWhBFPMfSzDbc8CYelsbHZa1d3uz-W_
- …