1,971 research outputs found

    Recurrent policy gradients

    Get PDF
    Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a ‘‘Long Short-Term Memory'' RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environment

    Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

    Full text link
    Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES to achieve higher performance on Atari and simulated robots learning to walk around a deceptive trap. This paper thus introduces a family of fast, scalable algorithms for reinforcement learning that are capable of directed exploration. It also adds this new family of exploration algorithms to the RL toolbox and raises the interesting possibility that analogous algorithms with multiple simultaneous paths of exploration might also combine well with existing RL algorithms outside ES

    Hierarchical evolution of robotic controllers for complex tasks

    Get PDF
    A robótica evolucionária é uma metodologia que permite que robôs aprendam a efetuar uma tarefa através da afinação automática dos seus “cérebros” (controladores). Apesar do processo evolutivo ser das formas de aprendizagem mais radicais e abertas, a sua aplicação a tarefas de maior complexidade comportamental não é fácil. Visto que os controladores são habitualmente evoluídos através de simulação computacional, é incontornável que existam diferenças entre os sensores e atuadores reais e as suas versões simuladas. Estas diferenças impedem que os controladores evoluídos alcancem um desempenho em robôs reais equivalente ao da simulação. Nesta dissertação propomos uma abordagem para ultrapassar tanto o problema da complexidade comportamental como o problema da transferência para a realidade. Mostramos como um controlador pode ser evoluído para uma tarefa complexa através da evolução hierárquica de comportamentos. Experimentamos também combinar técnicas evolucionárias com comportamentos pré-programados. Demonstramos a nossa abordagem numa tarefa em que um robô tem que encontrar e salvar um colega. O robô começa numa sala com obstáculos e o colega está localizado num labirinto ligado à sala. Dividimos a tarefa de salvamento em diferentes sub-tarefas, evoluímos controladores para cada sub-tarefa, e combinamos os controladores resultantes através de evoluções adicionais. Testamos os controladores em simulação e comparamos o desempenho num robô real. O controlador alcançou uma taxa de sucesso superior a 90% tanto na simulação como na realidade. As contribuições principais do nosso estudo são a introdução de uma metodologia inovadora para a evolução de controladores para tarefas complexas, bem como a sua demonstração num robô real.Evolutionary robotics is a methodology that allows for robots to learn how perform a task by automatically fine-tuning their “brain” (controller). Evolution is one of the most radical and open-ended forms of learning, but it has proven difficult for tasks where complex behavior is necessary (know as the bootstrapping problem). Controllers are usually evolved through computer simulation, and differences between real sensors and actuators and their simulated implementations are unavoidable. These differences prevent evolved controllers from crossing the reality gap, that is, achieving similar performance in real robotic hardware as they do in simulation. In this dissertation, we propose an approach to overcome both the bootstrapping problem and the reality gap. We demonstrate how a controller can be evolved for a complex task through hierarchical evolution of behaviors. We further experiment with combining evolutionary techniques and preprogrammed behaviors. We demonstrate our approach in a task in which a robot has to find and rescue a teammate. The robot starts in a room with obstacles and the teammate is located in a double T-maze connected to the room. We divide the rescue task into different sub-tasks, evolve controllers for each sub-task, and then combine the resulting controllers in a bottom-up fashion through additional evolutionary runs. The controller achieved a task completion rate of more than 90% both in simulation and on real robotic hardware. The main contributions of our study are the introduction of a novel methodology for evolving controllers for complex tasks, and its demonstration on real robotic hardware

    Evolving unipolar memristor spiking neural networks

    Get PDF
    © 2015 Taylor & Francis. Neuromorphic computing – brain-like computing in hardware – typically requires myriad complimentary metal oxide semiconductor spiking neurons interconnected by a dense mesh of nanoscale plastic synapses. Memristors are frequently cited as strong synapse candidates due to their statefulness and potential for low-power implementations. To date, plentiful research has focused on the bipolar memristor synapse, which is capable of incremental weight alterations and can provide adaptive self-organisation under a Hebbian learning scheme. In this paper, we consider the unipolar memristor synapse – a device capable of non-Hebbian switching between only two states (conductive and resistive) through application of a suitable input voltage – and discuss its suitability for neuromorphic systems. A self-adaptive evolutionary process is used to autonomously find highly fit network configurations. Experimentation on two robotics tasks shows that unipolar memristor networks evolve task-solving controllers faster than both bipolar memristor networks and networks containing constant non-plastic connections whilst performing at least comparably