Search CORE

1,971 research outputs found

Quantitative approach to the analysis of memory requirements for autonomous agent behaviours using evolutionary computation

Author: Kim DaeEun
Publication venue: The University of Edinburgh
Publication date: 01/01/2002
Field of study

Recurrent policy gradients

Author: Förster Alexander
Peters Jan
Peters Jan
Schmidhuber Jürgen
Schmidhuber Jürgen
Wierstra Daan
Publication venue
Publication date: 02/08/2017
Field of study

Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a ‘‘Long Short-Term Memory'' RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environment

RERO DOC Digital Library

Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

Author: Conti Edoardo
Madhavan Vashisht
Such Felipe Petroski
Lehman Joel
Stanley Kenneth O.
Clune Jeff
Publication venue
Publication date: 29/10/2018
Field of study

Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES

arXiv.org e-Print Archive

FigShare

Hierarchical evolution of robotic controllers for complex tasks

Author: Duarte Miguel António Frade
Publication venue
Publication date: 01/01/2012
Field of study

A robótica evolucionária é uma metodologia que permite que robôs aprendam a efetuar uma tarefa através da afinação automática dos seus “cérebros” (controladores). Apesar do processo evolutivo ser das formas de aprendizagem mais radicais e abertas, a sua aplicação a tarefas de maior complexidade comportamental não é fácil. Visto que os controladores são habitualmente evoluídos através de simulação computacional, é incontornável que existam diferenças entre os sensores e atuadores reais e as suas versões simuladas. Estas diferenças impedem que os controladores evoluídos alcancem um desempenho em robôs reais equivalente ao da simulação. Nesta dissertação propomos uma abordagem para ultrapassar tanto o problema da complexidade comportamental como o problema da transferência para a realidade. Mostramos como um controlador pode ser evoluído para uma tarefa complexa através da evolução hierárquica de comportamentos. Experimentamos também combinar técnicas evolucionárias com comportamentos pré-programados. Demonstramos a nossa abordagem numa tarefa em que um robô tem que encontrar e salvar um colega. O robô começa numa sala com obstáculos e o colega está localizado num labirinto ligado à sala. Dividimos a tarefa de salvamento em diferentes sub-tarefas, evoluímos controladores para cada sub-tarefa, e combinamos os controladores resultantes através de evoluções adicionais. Testamos os controladores em simulação e comparamos o desempenho num robô real. O controlador alcançou uma taxa de sucesso superior a 90% tanto na simulação como na realidade. As contribuições principais do nosso estudo são a introdução de uma metodologia inovadora para a evolução de controladores para tarefas complexas, bem como a sua demonstração num robô real.Evolutionary robotics is a methodology that allows for robots to learn how perform a task by automatically fine-tuning their “brain” (controller). Evolution is one of the most radical and open-ended forms of learning, but it has proven difficult for tasks where complex behavior is necessary (know as the bootstrapping problem). Controllers are usually evolved through computer simulation, and differences between real sensors and actuators and their simulated implementations are unavoidable. These differences prevent evolved controllers from crossing the reality gap, that is, achieving similar performance in real robotic hardware as they do in simulation. In this dissertation, we propose an approach to overcome both the bootstrapping problem and the reality gap. We demonstrate how a controller can be evolved for a complex task through hierarchical evolution of behaviors. We further experiment with combining evolutionary techniques and preprogrammed behaviors. We demonstrate our approach in a task in which a robot has to find and rescue a teammate. The robot starts in a room with obstacles and the teammate is located in a double T-maze connected to the room. We divide the rescue task into different sub-tasks, evolve controllers for each sub-task, and then combine the resulting controllers in a bottom-up fashion through additional evolutionary runs. The controller achieved a task completion rate of more than 90% both in simulation and on real robotic hardware. The main contributions of our study are the introduction of a novel methodology for evolving controllers for complex tasks, and its demonstration on real robotic hardware

Repositório Institucional do ISCTE-IUL

Evolving unipolar memristor spiking neural networks

Author: A. Afifi
A. Soltoggio
A. Soltoggio
A.L. Hodgkin
C. Mead
D. Floreano
D.B. Strukov
D.O. Hebb
G. Xia
G.Q. Bi
J. Blynel
J. Urzelai
J.H. Holland
J.M. Rabaey
L. Chua
M. Rocha
O. Michel
R. Waser
S. Nolfi
X. Sun
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2015
Field of study

© 2015 Taylor & Francis. Neuromorphic computing – brain-like computing in hardware – typically requires myriad complimentary metal oxide semiconductor spiking neurons interconnected by a dense mesh of nanoscale plastic synapses. Memristors are frequently cited as strong synapse candidates due to their statefulness and potential for low-power implementations. To date, plentiful research has focused on the bipolar memristor synapse, which is capable of incremental weight alterations and can provide adaptive self-organisation under a Hebbian learning scheme. In this paper, we consider the unipolar memristor synapse – a device capable of non-Hebbian switching between only two states (conductive and resistive) through application of a suitable input voltage – and discuss its suitability for neuromorphic systems. A self-adaptive evolutionary process is used to autonomously find highly fit network configurations. Experimentation on two robotics tasks shows that unipolar memristor networks evolve task-solving controllers faster than both bipolar memristor networks and networks containing constant non-plastic connections whilst performing at least comparably

arXiv.org e-Print Archive

Crossref

UWE Bristol Research Repository

Recommended from our members

Causes and consequences of representational drift.

Author: Harvey Christopher D
O'Leary Timothy
Rule Michael E
Publication venue: Current Opinion in Neurobiology
Publication date: 01/10/2019
Field of study

The nervous system learns new associations while maintaining memories over long periods, exhibiting a balance between flexibility and stability. Recent experiments reveal that neuronal representations of learned sensorimotor tasks continually change over days and weeks, even after animals have achieved expert behavioral performance. How is learned information stored to allow consistent behavior despite ongoing changes in neuronal activity? What functions could ongoing reconfiguration serve? We highlight recent experimental evidence for such representational drift in sensorimotor systems, and discuss how this fits into a framework of distributed population codes. We identify recent theoretical work that suggests computational roles for drift and argue that the recurrent and distributed nature of sensorimotor representations permits drift while limiting disruptive effects. We propose that representational drift may create error signals between interconnected brain regions that can be used to keep neural codes consistent in the presence of continual change. These concepts suggest experimental and theoretical approaches to studying both learning and maintenance of distributed and adaptive population codes.This work is supported by the Human Frontier Science Program, ERC grant StG 716643 FLEXNEURO, and NIH grants (NS108410, NS089521, MH107620)

Apollo (Cambridge)