Search CORE

92,617 research outputs found

Efficient Exploration using Model-Based Quality-Diversity with Gradients

Author: Cully Antoine
Flageat Manon
Lim Bryan
Publication venue
Publication date: 22/11/2022
Field of study

Exploration is a key challenge in Reinforcement Learning, especially in long-horizon, deceptive and sparse-reward environments. For such applications, population-based approaches have proven effective. Methods such as Quality-Diversity deals with this by encouraging novel solutions and producing a diversity of behaviours. However, these methods are driven by either undirected sampling (i.e. mutations) or use approximated gradients (i.e. Evolution Strategies) in the parameter space, which makes them highly sample-inefficient. In this paper, we propose a model-based Quality-Diversity approach. It extends existing QD methods to use gradients for efficient exploitation and leverage perturbations in imagination for efficient exploration. Our approach optimizes all members of a population simultaneously to maintain both performance and diversity efficiently by leveraging the effectiveness of QD algorithms as good data generators to train deep models. We demonstrate that it maintains the divergent search capabilities of population-based approaches on tasks with deceptive rewards while significantly improving their sample efficiency and quality of solutions

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Open-ended Learning in Symmetric Zero-sum Games

Author: Bachrach Yoram
Balduzzi David
Czarnecki Wojciech M.
Garnelo Marta
Graepel Thore
Jaderberg Max
Perolat Julien
Publication venue
Publication date: 01/01/2019
Field of study

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio

arXiv.org e-Print Archive

UCL Discovery

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Author: Conti Edoardo
Madhavan Vashisht
Such Felipe Petroski
Lehman Joel
Stanley Kenneth O.
Clune Jeff
Publication venue
Publication date: 29/10/2018
Field of study

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES

arXiv.org e-Print Archive

FigShare

Two-Timescale Learning Using Idiotypic Behaviour Mediation For A Navigating Mobile Robot

Author: Amanda M. Whitbrook
Brooks
Canham
de Castro
Farmer
Floreano
Franz
Hart
Hornby
Jerne
Jonathan M. Garibaldi
Keymeulen
Krautmacher
Luh
Marocco
Michel
Michelan
Neal
Renders
Uwe Aickelin
Vargas
Walker
Watanabe
Watanabe
Watson
Whitbrook
Whitbrook
Whitbrook
Zykov
Publication venue
Publication date: 01/01/2010
Field of study

A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approach to solving mobile-robot navigation problems is presented and tested in both the real and virtual domains. The LTL phase consists of rapid simulations that use a Genetic Algorithm to derive diverse sets of behaviours, encoded as variable sets of attributes, and the STL phase is an idiotypic Artificial Immune System. Results from the LTL phase show that sets of behaviours develop very rapidly, and significantly greater diversity is obtained when multiple autonomous populations are used, rather than a single one. The architecture is assessed under various scenarios, including removal of the LTL phase and switching off the idiotypic mechanism in the STL phase. The comparisons provide substantial evidence that the best option is the inclusion of both the LTL phase and the idiotypic system. In addition, this paper shows that structurally different environments can be used for the two phases without compromising transferability.Comment: 40 pages, 12 tables, Journal of Applied Soft Computin

arXiv.org e-Print Archive

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

University of Melbourne Institutional Repository

Automatic Curriculum Learning For Deep RL: A Short Survey

Author: Colas Cédric
Hofmann Katja
Oudeyer Pierre-Yves
Portelas Rémy
Weng Lilian
Publication venue
Publication date: 28/05/2020
Field of study

Automatic Curriculum Learning (ACL) has become a cornerstone of recent successes in Deep Reinforcement Learning (DRL).These methods shape the learning trajectories of agents by challenging them with tasks adapted to their capacities. In recent years, they have been used to improve sample efficiency and asymptotic performance, to organize exploration, to encourage generalization or to solve sparse reward problems, among others. The ambition of this work is dual: 1) to present a compact and accessible introduction to the Automatic Curriculum Learning literature and 2) to draw a bigger picture of the current state of the art in ACL to encourage the cross-breeding of existing concepts and the emergence of new ideas.Comment: Accepted at IJCAI202

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server