Search CORE

2,089 research outputs found

Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients

Author: Chen Jay
Clune Jeff
Lehman Joel
Stanley Kenneth O.
Publication venue
Publication date: 01/05/2018
Field of study

While neuroevolution (evolving neural networks) has a successful track record across a variety of domains from reinforcement learning to artificial life, it is rarely applied to large, deep neural networks. A central reason is that while random mutation generally works in low dimensions, a random perturbation of thousands or millions of weights is likely to break existing functionality, providing no learning signal even if some individual weight changes were beneficial. This paper proposes a solution by introducing a family of safe mutation (SM) operators that aim within the mutation operator itself to find a degree of change that does not alter network behavior too much, but still facilitates exploration. Importantly, these SM operators do not require any additional interactions with the environment. The most effective SM variant capitalizes on the intriguing opportunity to scale the degree of mutation of each individual weight according to the sensitivity of the network's outputs to that weight, which requires computing the gradient of outputs with respect to the weights (instead of the gradient of error, as in conventional deep learning). This safe mutation through gradients (SM-G) operator dramatically increases the ability of a simple genetic algorithm-based neuroevolution method to find solutions in high-dimensional domains that require deep and/or recurrent neural networks (which tend to be particularly brittle to mutation), including domains that require processing raw pixels. By improving our ability to evolve deep neural networks, this new safer approach to mutation expands the scope of domains amenable to neuroevolution

arXiv.org e-Print Archive

Crossref

ES Is More Than Just a Traditional Finite-Difference Approximator

Author: Chen Jay
Clune Jeff
Lehman Joel
Stanley Kenneth O.
Publication venue
Publication date: 01/05/2018
Field of study

An evolution strategy (ES) variant based on a simplification of a natural evolution strategy recently attracted attention because it performs surprisingly well in challenging deep reinforcement learning domains. It searches for neural network parameters by generating perturbations to the current set of parameters, checking their performance, and moving in the aggregate direction of higher reward. Because it resembles a traditional finite-difference approximation of the reward gradient, it can naturally be confused with one. However, this ES optimizes for a different gradient than just reward: It optimizes for the average reward of the entire population, thereby seeking parameters that are robust to perturbation. This difference can channel ES into distinct areas of the search space relative to gradient descent, and also consequently to networks with distinct properties. This unique robustness-seeking property, and its consequences for optimization, are demonstrated in several domains. They include humanoid locomotion, where networks from policy gradient-based reinforcement learning are significantly less robust to parameter perturbation than ES-based policies solving the same task. While the implications of such robustness and robustness-seeking remain open to further study, this work's main contribution is to highlight such differences and their potential importance

arXiv.org e-Print Archive

Crossref

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Author: Conti Edoardo
Madhavan Vashisht
Such Felipe Petroski
Lehman Joel
Stanley Kenneth O.
Clune Jeff
Publication venue
Publication date: 29/10/2018
Field of study

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES

arXiv.org e-Print Archive

FigShare

The effects of graduate training on reasoning: Formal discipline and thinking about everyday life events

Author: Lehman Darrin R.
Lempert Richard O.
Nisbett Richard E.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/1988
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/92173/1/TheEffectsOfGraduateTraining.pd

CiteSeerX

Crossref

Deep Blue Documents at the University of Michigan

The effects of graduate training on reasoning: Formal discipline and thinking about everyday-life events.

Author: Darrin R. Lehman
Richard E. Nisbett
Richard O. Lempert
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2002
Field of study

Crossref

Cooperation Among Small Academic Libraries

Author: Lehman James O.
Publication venue: Association of College and Research Libraries. American Library Association
Publication date: 01/11/1969
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository