12 research outputs found
Meta-Learning by the Baldwin Effect
The scope of the Baldwin effect was recently called into question by two
papers that closely examined the seminal work of Hinton and Nowlan. To this
date there has been no demonstration of its necessity in empirically
challenging tasks. Here we show that the Baldwin effect is capable of evolving
few-shot supervised and reinforcement learning mechanisms, by shaping the
hyperparameters and the initial parameters of deep learning algorithms.
Furthermore it can genetically accommodate strong learning biases on the same
set of problems as a recent machine learning algorithm called MAML "Model
Agnostic Meta-Learning" which uses second-order gradients instead of evolution
to learn a set of reference parameters (initial weights) that can allow rapid
adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is
more data efficient than the Baldwin effect, the Baldwin effect is more general
in that it does not require gradients to be backpropagated to the reference
parameters or hyperparameters, and permits effectively any number of gradient
updates in the inner loop. The Baldwin effect learns strong learning dependent
biases, rather than purely genetically accommodating fixed behaviours in a
learning independent manner
Importance mixing: Improving sample reuse in evolutionary policy search methods
Deep neuroevolution, that is evolutionary policy search methods based on deep
neural networks, have recently emerged as a competitor to deep reinforcement
learning algorithms due to their better parallelization capabilities. However,
these methods still suffer from a far worse sample efficiency. In this paper we
investigate whether a mechanism known as "importance mixing" can significantly
improve their sample efficiency. We provide a didactic presentation of
importance mixing and we explain how it can be extended to reuse more samples.
Then, from an empirical comparison based on a simple benchmark, we show that,
though it actually provides better sample efficiency, it is still far from the
sample efficiency of deep reinforcement learning, though it is more stable
A Generalized Markov-Chain Modelling Approach to -ES Linear Optimization: Technical Report
Several recent publications investigated Markov-chain modelling of linear
optimization by a -ES, considering both unconstrained and linearly
constrained optimization, and both constant and varying step size. All of them
assume normality of the involved random steps, and while this is consistent
with a black-box scenario, information on the function to be optimized (e.g.
separability) may be exploited by the use of another distribution. The
objective of our contribution is to complement previous studies realized with
normal steps, and to give sufficient conditions on the distribution of the
random steps for the success of a constant step-size -ES on the
simple problem of a linear function with a linear constraint. The decomposition
of a multidimensional distribution into its marginals and the copula combining
them is applied to the new distributional assumptions, particular attention
being paid to distributions with Archimedean copulas
Exploring Evolution Strategies for Reinforcement Learning in the Obstacle Tower Environment
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn 2017 OpenAI demonstrated that it was possible to train an AI agent by using Evolution
Strategies (ES), and that the results rivaled standard Reinforcement Learning (RL) techniques
on modern benchmarks. Their research effectively showed that Evolution Strategies is a viable
alternative to traditional Reinforcement Learning techniques, and that it bypasses many of
Reinforcement Learningâs inconveniences, notably the use of backpropagation.
The Obstacle Tower environment aims to set a new Reinforcement Learning
benchmark by challenging Artificial Intelligence (AI) agents to traverse 3-Dimensional
procedurally generated levels using a real-time 3-Dimensional physics system. The
environment tests an agentâs ability to generalize by requiring it to optimize aspects that are
common in many Reinforcement Learning environments, but rarely combined in the same
environment: vision, planning, and control.
In this research, the original implementation of OpenAIâs Evolution Strategies
algorithm was applied for the first time to the Obstacle Tower environment to assess how well
it performs in a more complex environment, where the agentâs generalization ability is critical.
Additionally, in the interest of exploring Evolution Strategies in this environment, common
Genetic Algorithm selection and mutation techniques were developed and applied to try and
improve the performance of the original Evolution Strategies implementation. Crossover
techniques were not explored during this research, as they are rarely applied in Evolution
Strategies. The results show that although the basic implementation of Evolution Strategies
does not perform well in the complex Obstacle Tower environment, it is possible to improve
its performance by applying different evolution methods borrowed from Genetic Algorithm
(GA), which are algorithms belonging to the same family as Evolution Strategies
Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles
We present a canonical way to turn any smooth parametric family of
probability distributions on an arbitrary search space into a
continuous-time black-box optimization method on , the
\emph{information-geometric optimization} (IGO) method. Invariance as a design
principle minimizes the number of arbitrary choices. The resulting \emph{IGO
flow} conducts the natural gradient ascent of an adaptive, time-dependent,
quantile-based transformation of the objective function. It makes no
assumptions on the objective function to be optimized.
The IGO method produces explicit IGO algorithms through time discretization.
It naturally recovers versions of known algorithms and offers a systematic way
to derive new ones. The cross-entropy method is recovered in a particular case,
and can be extended into a smoothed, parametrization-independent maximum
likelihood update (IGO-ML). For Gaussian distributions on , IGO
is related to natural evolution strategies (NES) and recovers a version of the
CMA-ES algorithm. For Bernoulli distributions on , we recover the
PBIL algorithm. From restricted Boltzmann machines, we obtain a novel algorithm
for optimization on . All these algorithms are unified under a
single information-geometric optimization framework.
Thanks to its intrinsic formulation, the IGO method achieves invariance under
reparametrization of the search space , under a change of parameters of the
probability distributions, and under increasing transformations of the
objective function.
Theory strongly suggests that IGO algorithms have minimal loss in diversity
during optimization, provided the initial diversity is high. First experiments
using restricted Boltzmann machines confirm this insight. Thus IGO seems to
provide, from information theory, an elegant way to spontaneously explore
several valleys of a fitness landscape in a single run.Comment: Final published versio