1,971 research outputs found
Recurrent policy gradients
Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a ‘‘Long Short-Term Memory'' RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environment
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
Evolution strategies (ES) are a family of black-box optimization algorithms
able to train deep neural networks roughly as well as Q-learning and policy
gradient methods on challenging deep reinforcement learning (RL) problems, but
are much faster (e.g. hours vs. days) because they parallelize better. However,
many RL problems require directed exploration because they have reward
functions that are sparse or deceptive (i.e. contain local optima), and it is
unknown how to encourage such exploration with ES. Here we show that algorithms
that have been invented to promote directed exploration in small-scale evolved
neural networks via populations of exploring agents, specifically novelty
search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to
improve its performance on sparse or deceptive deep RL tasks, while retaining
scalability. Our experiments confirm that the resultant new algorithms, NS-ES
and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES
to achieve higher performance on Atari and simulated robots learning to walk
around a deceptive trap. This paper thus introduces a family of fast, scalable
algorithms for reinforcement learning that are capable of directed exploration.
It also adds this new family of exploration algorithms to the RL toolbox and
raises the interesting possibility that analogous algorithms with multiple
simultaneous paths of exploration might also combine well with existing RL
algorithms outside ES
Hierarchical evolution of robotic controllers for complex tasks
A robótica evolucionária é uma metodologia que permite que robôs aprendam
a efetuar uma tarefa através da afinação automática dos seus “cérebros” (controladores).
Apesar do processo evolutivo ser das formas de aprendizagem mais radicais
e abertas, a sua aplicação a tarefas de maior complexidade comportamental não
é fácil. Visto que os controladores são habitualmente evoluídos através de simulação
computacional, é incontornável que existam diferenças entre os sensores e
atuadores reais e as suas versões simuladas. Estas diferenças impedem que os controladores
evoluídos alcancem um desempenho em robôs reais equivalente ao da
simulação.
Nesta dissertação propomos uma abordagem para ultrapassar tanto o problema
da complexidade comportamental como o problema da transferência para
a realidade. Mostramos como um controlador pode ser evoluído para uma tarefa
complexa através da evolução hierárquica de comportamentos. Experimentamos
também combinar técnicas evolucionárias com comportamentos pré-programados.
Demonstramos a nossa abordagem numa tarefa em que um robô tem que encontrar
e salvar um colega. O robô começa numa sala com obstáculos e o colega
está localizado num labirinto ligado à sala. Dividimos a tarefa de salvamento
em diferentes sub-tarefas, evoluímos controladores para cada sub-tarefa, e combinamos
os controladores resultantes através de evoluções adicionais. Testamos os
controladores em simulação e comparamos o desempenho num robô real. O controlador
alcançou uma taxa de sucesso superior a 90% tanto na simulação como
na realidade.
As contribuições principais do nosso estudo são a introdução de uma metodologia
inovadora para a evolução de controladores para tarefas complexas, bem
como a sua demonstração num robô real.Evolutionary robotics is a methodology that allows for robots to learn how
perform a task by automatically fine-tuning their “brain” (controller). Evolution
is one of the most radical and open-ended forms of learning, but it has proven
difficult for tasks where complex behavior is necessary (know as the bootstrapping
problem). Controllers are usually evolved through computer simulation, and differences
between real sensors and actuators and their simulated implementations
are unavoidable. These differences prevent evolved controllers from crossing the
reality gap, that is, achieving similar performance in real robotic hardware as they
do in simulation.
In this dissertation, we propose an approach to overcome both the bootstrapping
problem and the reality gap. We demonstrate how a controller can be evolved
for a complex task through hierarchical evolution of behaviors. We further experiment
with combining evolutionary techniques and preprogrammed behaviors.
We demonstrate our approach in a task in which a robot has to find and
rescue a teammate. The robot starts in a room with obstacles and the teammate
is located in a double T-maze connected to the room. We divide the rescue task
into different sub-tasks, evolve controllers for each sub-task, and then combine
the resulting controllers in a bottom-up fashion through additional evolutionary
runs. The controller achieved a task completion rate of more than 90% both in
simulation and on real robotic hardware.
The main contributions of our study are the introduction of a novel methodology
for evolving controllers for complex tasks, and its demonstration on real
robotic hardware
Evolving unipolar memristor spiking neural networks
© 2015 Taylor & Francis. Neuromorphic computing – brain-like computing in hardware – typically requires myriad complimentary metal oxide semiconductor spiking neurons interconnected by a dense mesh of nanoscale plastic synapses. Memristors are frequently cited as strong synapse candidates due to their statefulness and potential for low-power implementations. To date, plentiful research has focused on the bipolar memristor synapse, which is capable of incremental weight alterations and can provide adaptive self-organisation under a Hebbian learning scheme. In this paper, we consider the unipolar memristor synapse – a device capable of non-Hebbian switching between only two states (conductive and resistive) through application of a suitable input voltage – and discuss its suitability for neuromorphic systems. A self-adaptive evolutionary process is used to autonomously find highly fit network configurations. Experimentation on two robotics tasks shows that unipolar memristor networks evolve task-solving controllers faster than both bipolar memristor networks and networks containing constant non-plastic connections whilst performing at least comparably
Recommended from our members
Causes and consequences of representational drift.
The nervous system learns new associations while maintaining memories over long periods, exhibiting a balance between flexibility and stability. Recent experiments reveal that neuronal representations of learned sensorimotor tasks continually change over days and weeks, even after animals have achieved expert behavioral performance. How is learned information stored to allow consistent behavior despite ongoing changes in neuronal activity? What functions could ongoing reconfiguration serve? We highlight recent experimental evidence for such representational drift in sensorimotor systems, and discuss how this fits into a framework of distributed population codes. We identify recent theoretical work that suggests computational roles for drift and argue that the recurrent and distributed nature of sensorimotor representations permits drift while limiting disruptive effects. We propose that representational drift may create error signals between interconnected brain regions that can be used to keep neural codes consistent in the presence of continual change. These concepts suggest experimental and theoretical approaches to studying both learning and maintenance of distributed and adaptive population codes.This work is supported by the Human Frontier Science Program, ERC grant StG 716643 FLEXNEURO, and NIH grants (NS108410, NS089521, MH107620)
- …