1,501 research outputs found
Importance mixing: Improving sample reuse in evolutionary policy search methods
Deep neuroevolution, that is evolutionary policy search methods based on deep
neural networks, have recently emerged as a competitor to deep reinforcement
learning algorithms due to their better parallelization capabilities. However,
these methods still suffer from a far worse sample efficiency. In this paper we
investigate whether a mechanism known as "importance mixing" can significantly
improve their sample efficiency. We provide a didactic presentation of
importance mixing and we explain how it can be extended to reuse more samples.
Then, from an empirical comparison based on a simple benchmark, we show that,
though it actually provides better sample efficiency, it is still far from the
sample efficiency of deep reinforcement learning, though it is more stable
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
Objective Improvement in Information-Geometric Optimization
Information-Geometric Optimization (IGO) is a unified framework of stochastic
algorithms for optimization problems. Given a family of probability
distributions, IGO turns the original optimization problem into a new
maximization problem on the parameter space of the probability distributions.
IGO updates the parameter of the probability distribution along the natural
gradient, taken with respect to the Fisher metric on the parameter manifold,
aiming at maximizing an adaptive transform of the objective function. IGO
recovers several known algorithms as particular instances: for the family of
Bernoulli distributions IGO recovers PBIL, for the family of Gaussian
distributions the pure rank-mu CMA-ES update is recovered, and for exponential
families in expectation parametrization the cross-entropy/ML method is
recovered. This article provides a theoretical justification for the IGO
framework, by proving that any step size not greater than 1 guarantees monotone
improvement over the course of optimization, in terms of q-quantile values of
the objective function f. The range of admissible step sizes is independent of
f and its domain. We extend the result to cover the case of different step
sizes for blocks of the parameters in the IGO algorithm. Moreover, we prove
that expected fitness improves over time when fitness-proportional selection is
applied, in which case the RPP algorithm is recovered
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
We present a canonical way to turn any smooth parametric family of
probability distributions on an arbitrary search space into a
continuous-time black-box optimization method on , the
\emph{information-geometric optimization} (IGO) method. Invariance as a design
principle minimizes the number of arbitrary choices. The resulting \emph{IGO
flow} conducts the natural gradient ascent of an adaptive, time-dependent,
quantile-based transformation of the objective function. It makes no
assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way
to derive new ones. The cross-entropy method is recovered in a particular case,
and can be extended into a smoothed, parametrization-independent maximum
likelihood update (IGO-ML). For Gaussian distributions on , IGO
is related to natural evolution strategies (NES) and recovers a version of the
CMA-ES algorithm. For Bernoulli distributions on , we recover the
PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm
for optimization on . All these algorithms are unified under a
single information-geometric optimization framework.
Thanks to its intrinsic formulation, the IGO method achieves invariance under
reparametrization of the search space , under a change of parameters of the
probability distributions, and under increasing transformations of the
objective function.
Theory strongly suggests that IGO algorithms have minimal loss in diversity
during optimization, provided the initial diversity is high. First experiments
using restricted Boltzmann machines confirm this insight. Thus IGO seems to
provide, from information theory, an elegant way to spontaneously explore
several valleys of a fitness landscape in a single run.Comment: Final published versio
A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization
We show that a large class of Estimation of Distribution Algorithms,
including, but not limited to, Covariance Matrix Adaption, can be written as a
Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of
infinite samples. Because EM sits on a rigorous statistical foundation and has
been thoroughly analyzed, this connection provides a new coherent framework
with which to reason about EDAs
Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains
In this paper, we consider comparison-based adaptive stochastic algorithms
for solving numerical optimisation problems. We consider a specific subclass of
algorithms that we call comparison-based step-size adaptive randomized search
(CB-SARS), where the state variables at a given iteration are a vector of the
search space and a positive parameter, the step-size, typically controlling the
overall standard deviation of the underlying search distribution.We investigate
the linear convergence of CB-SARS on\emph{scaling-invariant} objective
functions. Scaling-invariantfunctions preserve the ordering of points with
respect to their functionvalue when the points are scaled with the same
positive parameter (thescaling is done w.r.t. a fixed reference point). This
class offunctions includes norms composed with strictly increasing functions
aswell as many non quasi-convex and non-continuousfunctions. On
scaling-invariant functions, we show the existence of ahomogeneous Markov
chain, as a consequence of natural invarianceproperties of CB-SARS (essentially
scale-invariance and invariance tostrictly increasing transformation of the
objective function). We thenderive sufficient conditions for \emph{global
linear convergence} ofCB-SARS, expressed in terms of different stability
conditions of thenormalised homogeneous Markov chain (irreducibility,
positivity, Harrisrecurrence, geometric ergodicity) and thus define a general
methodologyfor proving global linear convergence of CB-SARS algorithms
onscaling-invariant functions. As a by-product we provide aconnexion between
comparison-based adaptive stochasticalgorithms and Markov chain Monte Carlo
algorithms.Comment: SIAM Journal on Optimization, Society for Industrial and Applied
Mathematics, 201
- …