68 research outputs found
Scaling MAP-Elites to Deep Neuroevolution
Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have
proven very useful for a broad range of applications including enabling real
robots to recover quickly from joint damage, solving strongly deceptive maze
tasks or evolving robot morphologies to discover new gaits. However, present
implementations of MAP-Elites and other QD algorithms seem to be limited to
low-dimensional controllers with far fewer parameters than modern deep neural
network models. In this paper, we propose to leverage the efficiency of
Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers
parameterized by large neural networks. We design and evaluate a new hybrid
algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage
recovery in a difficult high-dimensional control task where traditional ME
fails. Additionally, we show that ME-ES performs efficient exploration, on par
with state-of-the-art exploration algorithms in high-dimensional control tasks
with strongly deceptive rewards.Comment: Accepted to GECCO 202
Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
While neuroevolution (evolving neural networks) has a successful track record
across a variety of domains from reinforcement learning to artificial life, it
is rarely applied to large, deep neural networks. A central reason is that
while random mutation generally works in low dimensions, a random perturbation
of thousands or millions of weights is likely to break existing functionality,
providing no learning signal even if some individual weight changes were
beneficial. This paper proposes a solution by introducing a family of safe
mutation (SM) operators that aim within the mutation operator itself to find a
degree of change that does not alter network behavior too much, but still
facilitates exploration. Importantly, these SM operators do not require any
additional interactions with the environment. The most effective SM variant
capitalizes on the intriguing opportunity to scale the degree of mutation of
each individual weight according to the sensitivity of the network's outputs to
that weight, which requires computing the gradient of outputs with respect to
the weights (instead of the gradient of error, as in conventional deep
learning). This safe mutation through gradients (SM-G) operator dramatically
increases the ability of a simple genetic algorithm-based neuroevolution method
to find solutions in high-dimensional domains that require deep and/or
recurrent neural networks (which tend to be particularly brittle to mutation),
including domains that require processing raw pixels. By improving our ability
to evolve deep neural networks, this new safer approach to mutation expands the
scope of domains amenable to neuroevolution
Efficient Exploration using Model-Based Quality-Diversity with Gradients
Exploration is a key challenge in Reinforcement Learning, especially in
long-horizon, deceptive and sparse-reward environments. For such applications,
population-based approaches have proven effective. Methods such as
Quality-Diversity deals with this by encouraging novel solutions and producing
a diversity of behaviours. However, these methods are driven by either
undirected sampling (i.e. mutations) or use approximated gradients (i.e.
Evolution Strategies) in the parameter space, which makes them highly
sample-inefficient. In this paper, we propose a model-based Quality-Diversity
approach. It extends existing QD methods to use gradients for efficient
exploitation and leverage perturbations in imagination for efficient
exploration. Our approach optimizes all members of a population simultaneously
to maintain both performance and diversity efficiently by leveraging the
effectiveness of QD algorithms as good data generators to train deep models. We
demonstrate that it maintains the divergent search capabilities of
population-based approaches on tasks with deceptive rewards while significantly
improving their sample efficiency and quality of solutions
Evolutionary Reinforcement Learning: A Survey
Reinforcement learning (RL) is a machine learning approach that trains agents
to maximize cumulative rewards through interactions with environments. The
integration of RL with deep learning has recently resulted in impressive
achievements in a wide range of challenging tasks, including board games,
arcade games, and robot control. Despite these successes, there remain several
crucial challenges, including brittle convergence properties caused by
sensitive hyperparameters, difficulties in temporal credit assignment with long
time horizons and sparse rewards, a lack of diverse exploration, especially in
continuous search space scenarios, difficulties in credit assignment in
multi-agent reinforcement learning, and conflicting objectives for rewards.
Evolutionary computation (EC), which maintains a population of learning agents,
has demonstrated promising performance in addressing these limitations. This
article presents a comprehensive survey of state-of-the-art methods for
integrating EC into RL, referred to as evolutionary reinforcement learning
(EvoRL). We categorize EvoRL methods according to key research fields in RL,
including hyperparameter optimization, policy search, exploration, reward
shaping, meta-RL, and multi-objective RL. We then discuss future research
directions in terms of efficient methods, benchmarks, and scalable platforms.
This survey serves as a resource for researchers and practitioners interested
in the field of EvoRL, highlighting the important challenges and opportunities
for future research. With the help of this survey, researchers and
practitioners can develop more efficient methods and tailored benchmarks for
EvoRL, further advancing this promising cross-disciplinary research field
The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers
In the context of neuroevolution, Quality-Diversity algorithms have proven
effective in generating repertoires of diverse and efficient policies by
relying on the definition of a behavior space. A natural goal induced by the
creation of such a repertoire is trying to achieve behaviors on demand, which
can be done by running the corresponding policy from the repertoire. However,
in uncertain environments, two problems arise. First, policies can lack
robustness and repeatability, meaning that multiple episodes under slightly
different conditions often result in very different behaviors. Second, due to
the discrete nature of the repertoire, solutions vary discontinuously. Here we
present a new approach to achieve behavior-conditioned trajectory generation
based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains
the selection of solutions to those that are the most consistent in the
behavior space. Second, the Quality-Diversity Transformer (QDT), a
Transformer-based model conditioned on continuous behavior descriptors, which
trains on a dataset generated by policies from a ME-LS repertoire and learns to
autoregressively generate sequences of actions that achieve target behaviors.
Results show that ME-LS produces consistent and robust policies, and that its
combination with the QDT yields a single policy capable of achieving diverse
behaviors on demand with high accuracy.Comment: 10+7 page
LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization
Large Language Models (LLMs) have emerged as powerful tools capable of
accomplishing a broad spectrum of tasks. Their abilities span numerous areas,
and one area where they have made a significant impact is in the domain of code
generation. In this context, we view LLMs as mutation and crossover tools.
Meanwhile, Quality-Diversity (QD) algorithms are known to discover diverse and
robust solutions. By merging the code-generating abilities of LLMs with the
diversity and robustness of QD solutions, we introduce LLMatic, a Neural
Architecture Search (NAS) algorithm. While LLMs struggle to conduct NAS
directly through prompts, LLMatic uses a procedural approach, leveraging QD for
prompts and network architecture to create diverse and highly performant
networks. We test LLMatic on the CIFAR-10 image classification benchmark,
demonstrating that it can produce competitive networks with just
searches, even without prior knowledge of the benchmark domain or exposure to
any previous top-performing models for the benchmark
Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains
Quality-Diversity optimisation (QD) has proven to yield promising results
across a broad set of applications. However, QD approaches struggle in the
presence of uncertainty in the environment, as it impacts their ability to
quantify the true performance and novelty of solutions. This problem has been
highlighted multiple times independently in previous literature. In this work,
we propose to uniformise the view on this problem through four main
contributions. First, we formalise a common framework for uncertain domains:
the Uncertain QD setting, a special case of QD in which fitness and descriptors
for each solution are no longer fixed values but distribution over possible
values. Second, we propose a new methodology to evaluate Uncertain QD
approaches, relying on a new per-generation sampling budget and a set of
existing and new metrics specifically designed for Uncertain QD. Third, we
propose three new Uncertain QD algorithms: Archive-sampling,
Parallel-Adaptive-sampling and Deep-Grid-sampling. We propose these approaches
taking into account recent advances in the QD community toward the use of
hardware acceleration that enable large numbers of parallel evaluations and
make sampling an affordable approach to uncertainty. Our final and fourth
contribution is to use this new framework and the associated comparison methods
to benchmark existing and novel approaches. We demonstrate once again the
limitation of MAP-Elites in uncertain domains and highlight the performance of
the existing Deep-Grid approach, and of our new algorithms. The goal of this
framework and methods is to become an instrumental benchmark for future works
considering Uncertain QD.Comment: Submitted to Transactions on Evolutionary Computatio
Empirical analysis of PGA-MAP-Elites for neuroevolution in uncertain domains
Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space
- …