12 research outputs found

    Meta-Learning by the Baldwin Effect

    Full text link
    The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by shaping the hyperparameters and the initial parameters of deep learning algorithms. Furthermore it can genetically accommodate strong learning biases on the same set of problems as a recent machine learning algorithm called MAML "Model Agnostic Meta-Learning" which uses second-order gradients instead of evolution to learn a set of reference parameters (initial weights) that can allow rapid adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is more data efficient than the Baldwin effect, the Baldwin effect is more general in that it does not require gradients to be backpropagated to the reference parameters or hyperparameters, and permits effectively any number of gradient updates in the inner loop. The Baldwin effect learns strong learning dependent biases, rather than purely genetically accommodating fixed behaviours in a learning independent manner

    Importance mixing: Improving sample reuse in evolutionary policy search methods

    Full text link
    Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable

    A Generalized Markov-Chain Modelling Approach to (1,λ)(1,\lambda)-ES Linear Optimization: Technical Report

    Get PDF
    Several recent publications investigated Markov-chain modelling of linear optimization by a (1,λ)(1,\lambda)-ES, considering both unconstrained and linearly constrained optimization, and both constant and varying step size. All of them assume normality of the involved random steps, and while this is consistent with a black-box scenario, information on the function to be optimized (e.g. separability) may be exploited by the use of another distribution. The objective of our contribution is to complement previous studies realized with normal steps, and to give sufficient conditions on the distribution of the random steps for the success of a constant step-size (1,λ)(1,\lambda)-ES on the simple problem of a linear function with a linear constraint. The decomposition of a multidimensional distribution into its marginals and the copula combining them is applied to the new distributional assumptions, particular attention being paid to distributions with Archimedean copulas

    Exploring Evolution Strategies for Reinforcement Learning in the Obstacle Tower Environment

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn 2017 OpenAI demonstrated that it was possible to train an AI agent by using Evolution Strategies (ES), and that the results rivaled standard Reinforcement Learning (RL) techniques on modern benchmarks. Their research effectively showed that Evolution Strategies is a viable alternative to traditional Reinforcement Learning techniques, and that it bypasses many of Reinforcement Learning’s inconveniences, notably the use of backpropagation. The Obstacle Tower environment aims to set a new Reinforcement Learning benchmark by challenging Artificial Intelligence (AI) agents to traverse 3-Dimensional procedurally generated levels using a real-time 3-Dimensional physics system. The environment tests an agent’s ability to generalize by requiring it to optimize aspects that are common in many Reinforcement Learning environments, but rarely combined in the same environment: vision, planning, and control. In this research, the original implementation of OpenAI’s Evolution Strategies algorithm was applied for the first time to the Obstacle Tower environment to assess how well it performs in a more complex environment, where the agent’s generalization ability is critical. Additionally, in the interest of exploring Evolution Strategies in this environment, common Genetic Algorithm selection and mutation techniques were developed and applied to try and improve the performance of the original Evolution Strategies implementation. Crossover techniques were not explored during this research, as they are rarely applied in Evolution Strategies. The results show that although the basic implementation of Evolution Strategies does not perform well in the complex Obstacle Tower environment, it is possible to improve its performance by applying different evolution methods borrowed from Genetic Algorithm (GA), which are algorithms belonging to the same family as Evolution Strategies

    Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles

    Get PDF
    We present a canonical way to turn any smooth parametric family of probability distributions on an arbitrary search space XX into a continuous-time black-box optimization method on XX, the \emph{information-geometric optimization} (IGO) method. Invariance as a design principle minimizes the number of arbitrary choices. The resulting \emph{IGO flow} conducts the natural gradient ascent of an adaptive, time-dependent, quantile-based transformation of the objective function. It makes no assumptions on the objective function to be optimized. The IGO method produces explicit IGO algorithms through time discretization. It naturally recovers versions of known algorithms and offers a systematic way to derive new ones. The cross-entropy method is recovered in a particular case, and can be extended into a smoothed, parametrization-independent maximum likelihood update (IGO-ML). For Gaussian distributions on Rd\mathbb{R}^d, IGO is related to natural evolution strategies (NES) and recovers a version of the CMA-ES algorithm. For Bernoulli distributions on {0,1}d\{0,1\}^d, we recover the PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm for optimization on {0,1}d\{0,1\}^d. All these algorithms are unified under a single information-geometric optimization framework. Thanks to its intrinsic formulation, the IGO method achieves invariance under reparametrization of the search space XX, under a change of parameters of the probability distributions, and under increasing transformations of the objective function. Theory strongly suggests that IGO algorithms have minimal loss in diversity during optimization, provided the initial diversity is high. First experiments using restricted Boltzmann machines confirm this insight. Thus IGO seems to provide, from information theory, an elegant way to spontaneously explore several valleys of a fitness landscape in a single run.Comment: Final published versio
    corecore