1,501 research outputs found

    Importance mixing: Improving sample reuse in evolutionary policy search methods

    Full text link
    Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable

    Black-Box Data-efficient Policy Search for Robotics

    Get PDF
    The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017; Code at http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP

    Objective Improvement in Information-Geometric Optimization

    Get PDF
    Information-Geometric Optimization (IGO) is a unified framework of stochastic algorithms for optimization problems. Given a family of probability distributions, IGO turns the original optimization problem into a new maximization problem on the parameter space of the probability distributions. IGO updates the parameter of the probability distribution along the natural gradient, taken with respect to the Fisher metric on the parameter manifold, aiming at maximizing an adaptive transform of the objective function. IGO recovers several known algorithms as particular instances: for the family of Bernoulli distributions IGO recovers PBIL, for the family of Gaussian distributions the pure rank-mu CMA-ES update is recovered, and for exponential families in expectation parametrization the cross-entropy/ML method is recovered. This article provides a theoretical justification for the IGO framework, by proving that any step size not greater than 1 guarantees monotone improvement over the course of optimization, in terms of q-quantile values of the objective function f. The range of admissible step sizes is independent of f and its domain. We extend the result to cover the case of different step sizes for blocks of the parameters in the IGO algorithm. Moreover, we prove that expected fitness improves over time when fitness-proportional selection is applied, in which case the RPP algorithm is recovered

    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

    Get PDF
    We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space XX into a continuous-time black-box optimization method on XX, the \emph{information-geometric optimization} (IGO) method. Invariance as a design principle minimizes the number of arbitrary choices. The resulting \emph{IGO flow} conducts the natural gradient ascent of an adaptive, time-dependent, quantile-based transformation of the objective function. It makes no assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. The cross-entropy method is recovered in a particular case, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). For Gaussian distributions on Rd\mathbb{R}^d, IGO is related to natural evolution strategies (NES) and recovers a version of the CMA-ES algorithm. For Bernoulli distributions on {0,1}d\{0,1\}^d, we recover the PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm for optimization on {0,1}d\{0,1\}^d. All these algorithms are unified under a single information-geometric optimization framework. Thanks to its intrinsic formulation, the IGO method achieves invariance under reparametrization of the search space XX, under a change of parameters of the probability distributions, and under increasing transformations of the objective function. Theory strongly suggests that IGO algorithms have minimal loss in diversity during optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. Thus IGO seems to provide, from information theory, an elegant way to spontaneously explore several valleys of a fitness landscape in a single run.Comment: Final published versio

    A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

    Full text link
    We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs

    Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains

    Get PDF
    In this paper, we consider comparison-based adaptive stochastic algorithms for solving numerical optimisation problems. We consider a specific subclass of algorithms that we call comparison-based step-size adaptive randomized search (CB-SARS), where the state variables at a given iteration are a vector of the search space and a positive parameter, the step-size, typically controlling the overall standard deviation of the underlying search distribution.We investigate the linear convergence of CB-SARS on\emph{scaling-invariant} objective functions. Scaling-invariantfunctions preserve the ordering of points with respect to their functionvalue when the points are scaled with the same positive parameter (thescaling is done w.r.t. a fixed reference point). This class offunctions includes norms composed with strictly increasing functions aswell as many non quasi-convex and non-continuousfunctions. On scaling-invariant functions, we show the existence of ahomogeneous Markov chain, as a consequence of natural invarianceproperties of CB-SARS (essentially scale-invariance and invariance tostrictly increasing transformation of the objective function). We thenderive sufficient conditions for \emph{global linear convergence} ofCB-SARS, expressed in terms of different stability conditions of thenormalised homogeneous Markov chain (irreducibility, positivity, Harrisrecurrence, geometric ergodicity) and thus define a general methodologyfor proving global linear convergence of CB-SARS algorithms onscaling-invariant functions. As a by-product we provide aconnexion between comparison-based adaptive stochasticalgorithms and Markov chain Monte Carlo algorithms.Comment: SIAM Journal on Optimization, Society for Industrial and Applied Mathematics, 201
    corecore