856 research outputs found
Micro-Data Learning: The Other End of the Spectrum
Many fields are now snowed under with an avalanche of data, which raises
considerable challenges for computer scientists. Meanwhile, robotics (among
other fields) can often only use a few dozen data points because acquiring them
involves a process that is expensive or time-consuming. How can an algorithm
learn with only a few data points
Reset-free Trial-and-Error Learning for Robot Damage Recovery
The high probability of hardware failures prevents many advanced robots
(e.g., legged robots) from being confidently deployed in real-world situations
(e.g., post-disaster rescue). Instead of attempting to diagnose the failures,
robots could adapt by trial-and-error in order to be able to complete their
tasks. In this situation, damage recovery can be seen as a Reinforcement
Learning (RL) problem. However, the best RL algorithms for robotics require the
robot and the environment to be reset to an initial state after each episode,
that is, the robot is not learning autonomously. In addition, most of the RL
methods for robotics do not scale well with complex robots (e.g., walking
robots) and either cannot be used at all or take too long to converge to a
solution (e.g., hours of learning). In this paper, we introduce a novel
learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks
the complexity by pre-generating hundreds of possible behaviors with a dynamics
simulator of the intact robot, and (2) allows complex robots to quickly recover
from damage while completing their tasks and taking the environment into
account. We evaluate our algorithm on a simulated wheeled robot, a simulated
six-legged robot, and a real six-legged walking robot that are damaged in
several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and
whose objective is to reach a sequence of targets in an arena. Our experiments
show that the robots can recover most of their locomotion abilities in an
environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at
https://youtu.be/IqtyHFrb3BU, code at
https://github.com/resibots/chatzilygeroudis_2018_rt
Thinking Fast and Slow with Deep Learning and Tree Search
Sequential decision making problems, such as structured prediction, robotic
control, and game playing, require a combination of planning policies and
generalisation of those plans. In this paper, we present Expert Iteration
(ExIt), a novel reinforcement learning algorithm which decomposes the problem
into separate planning and generalisation tasks. Planning new policies is
performed by tree search, while a deep neural network generalises those plans.
Subsequently, tree search is improved by using the neural network policy to
guide search, increasing the strength of new plans. In contrast, standard deep
Reinforcement Learning algorithms rely on a neural network not only to
generalise plans, but to discover them too. We show that ExIt outperforms
REINFORCE for training a neural network to play the board game Hex, and our
final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most
recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters
changed - Repetition of experiments: improved accuracy and errors shown.
(note the reduction in effect size for the tpt/cat experiment) - Results from
a longer training run, including changes in expert strength in training -
Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see
appendix
- …