Search CORE

856 research outputs found

Micro-Data Learning: The Other End of the Spectrum

Author: Mouret Jean-Baptiste
Publication venue
Publication date: 01/09/2016
Field of study

Many fields are now snowed under with an avalanche of data, which raises considerable challenges for computer scientists. Meanwhile, robotics (among other fields) can often only use a few dozen data points because acquiring them involves a process that is expensive or time-consuming. How can an algorithm learn with only a few data points

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Reset-free Trial-and-Error Learning for Robot Damage Recovery

Author: Baranes
Blanke
Bongard
Browne
Calandra
Carlson
Corbato
Cully
DeDonato
Deisenroth
Deisenroth
Deisenroth
Droniou
Durrant-Whyte
Guizzo
Hester
Isermann
Jean-Baptiste Mouret
Kavraki
Kober
Konstantinos Chatzilygeroudis
Koos
LaValle
LaValle
Lengagne
Mnih
Mostafa
Mouret
Nguyen
Nguyen-Tuong
Nori
Peters
Pugh
Quiñonero-Candela
Rasmussen
Ren
Shahriari
Silver
Stulp
Sutton
Vassilis Vassiliades
Verma
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at https://youtu.be/IqtyHFrb3BU, code at https://github.com/resibots/chatzilygeroudis_2018_rt

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Thinking Fast and Slow with Deep Learning and Tree Search

Author: Anthony Thomas
Barber David
Tian Zheng
Publication venue
Publication date: 01/11/2017
Field of study

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters changed - Repetition of experiments: improved accuracy and errors shown. (note the reduction in effect size for the tpt/cat experiment) - Results from a longer training run, including changes in expert strength in training - Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see appendix

arXiv.org e-Print Archive

UCL Discovery