251 research outputs found
The Spiritual Reconstruction of the World
https://digitalcommons.acu.edu/crs_books/1608/thumbnail.jp
Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients
While neuroevolution (evolving neural networks) has a successful track record
across a variety of domains from reinforcement learning to artificial life, it
is rarely applied to large, deep neural networks. A central reason is that
while random mutation generally works in low dimensions, a random perturbation
of thousands or millions of weights is likely to break existing functionality,
providing no learning signal even if some individual weight changes were
beneficial. This paper proposes a solution by introducing a family of safe
mutation (SM) operators that aim within the mutation operator itself to find a
degree of change that does not alter network behavior too much, but still
facilitates exploration. Importantly, these SM operators do not require any
additional interactions with the environment. The most effective SM variant
capitalizes on the intriguing opportunity to scale the degree of mutation of
each individual weight according to the sensitivity of the network's outputs to
that weight, which requires computing the gradient of outputs with respect to
the weights (instead of the gradient of error, as in conventional deep
learning). This safe mutation through gradients (SM-G) operator dramatically
increases the ability of a simple genetic algorithm-based neuroevolution method
to find solutions in high-dimensional domains that require deep and/or
recurrent neural networks (which tend to be particularly brittle to mutation),
including domains that require processing raw pixels. By improving our ability
to evolve deep neural networks, this new safer approach to mutation expands the
scope of domains amenable to neuroevolution
ES Is More Than Just a Traditional Finite-Difference Approximator
An evolution strategy (ES) variant based on a simplification of a natural
evolution strategy recently attracted attention because it performs
surprisingly well in challenging deep reinforcement learning domains. It
searches for neural network parameters by generating perturbations to the
current set of parameters, checking their performance, and moving in the
aggregate direction of higher reward. Because it resembles a traditional
finite-difference approximation of the reward gradient, it can naturally be
confused with one. However, this ES optimizes for a different gradient than
just reward: It optimizes for the average reward of the entire population,
thereby seeking parameters that are robust to perturbation. This difference can
channel ES into distinct areas of the search space relative to gradient
descent, and also consequently to networks with distinct properties. This
unique robustness-seeking property, and its consequences for optimization, are
demonstrated in several domains. They include humanoid locomotion, where
networks from policy gradient-based reinforcement learning are significantly
less robust to parameter perturbation than ES-based policies solving the same
task. While the implications of such robustness and robustness-seeking remain
open to further study, this work's main contribution is to highlight such
differences and their potential importance
Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space
We focus on the challenge of finding a diverse collection of quality
solutions on complex continuous domains. While quality diver-sity (QD)
algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are
designed to generate a diverse range of solutions, these algorithms require a
large number of evaluations for exploration of continuous spaces. Meanwhile,
variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are
among the best-performing derivative-free optimizers in single-objective
continuous domains. This paper proposes a new QD algorithm called Covariance
Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the
self-adaptation techniques of CMA-ES with archiving and mapping techniques for
maintaining diversity in QD. Results from experiments based on standard
continuous optimization benchmarks show that CMA-ME finds better-quality
solutions than MAP-Elites; similarly, results on the strategic game Hearthstone
show that CMA-ME finds both a higher overall quality and broader diversity of
strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles
the performance of MAP-Elites using standard QD performance metrics. These
results suggest that QD algorithms augmented by operators from state-of-the-art
optimization algorithms can yield high-performing methods for simultaneously
exploring and optimizing continuous search spaces, with significant
applications to design, testing, and reinforcement learning among other
domains.Comment: Accepted to GECCO 202
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
Evolution strategies (ES) are a family of black-box optimization algorithms
able to train deep neural networks roughly as well as Q-learning and policy
gradient methods on challenging deep reinforcement learning (RL) problems, but
are much faster (e.g. hours vs. days) because they parallelize better. However,
many RL problems require directed exploration because they have reward
functions that are sparse or deceptive (i.e. contain local optima), and it is
unknown how to encourage such exploration with ES. Here we show that algorithms
that have been invented to promote directed exploration in small-scale evolved
neural networks via populations of exploring agents, specifically novelty
search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to
improve its performance on sparse or deceptive deep RL tasks, while retaining
scalability. Our experiments confirm that the resultant new algorithms, NS-ES
and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES
to achieve higher performance on Atari and simulated robots learning to walk
around a deceptive trap. This paper thus introduces a family of fast, scalable
algorithms for reinforcement learning that are capable of directed exploration.
It also adds this new family of exploration algorithms to the RL toolbox and
raises the interesting possibility that analogous algorithms with multiple
simultaneous paths of exploration might also combine well with existing RL
algorithms outside ES
- …