Search CORE

21,697 research outputs found

Evolutionary Algorithms for Reinforcement Learning

Author: Grefenstette J. J.
Moriarty D. E.
Schultz A. C.
Publication venue: 'AI Access Foundation'
Publication date: 01/06/2011
Field of study

There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications

arXiv.org e-Print Archive

Crossref

Fast Damage Recovery in Robotics with the T-Resilience Algorithm

Author: Cully Antoine
Koos Sylvain
Mouret Jean-Baptiste
Publication venue: 'SAGE Publications'
Publication date: 02/02/2013
Field of study

Damage recovery is critical for autonomous robots that need to operate for a long time without assistance. Most current methods are complex and costly because they require anticipating each potential damage in order to have a contingency plan ready. As an alternative, we introduce the T-resilience algorithm, a new algorithm that allows robots to quickly and autonomously discover compensatory behaviors in unanticipated situations. This algorithm equips the robot with a self-model and discovers new behaviors by learning to avoid those that perform differently in the self-model and in reality. Our algorithm thus does not identify the damaged parts but it implicitly searches for efficient behaviors that do not use them. We evaluate the T-Resilience algorithm on a hexapod robot that needs to adapt to leg removal, broken legs and motor failures; we compare it to stochastic local search, policy gradient and the self-modeling algorithm proposed by Bongard et al. The behavior of the robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using only 25 tests on the robot and an overall running time of 20 minutes, T-Resilience consistently leads to substantially better results than the other approaches

arXiv.org e-Print Archive

Crossref

HAL Descartes

Spiral - Imperial College Digital Repository

Hal-Diderot