1,991 research outputs found
Reset-free Trial-and-Error Learning for Robot Damage Recovery
The high probability of hardware failures prevents many advanced robots
(e.g., legged robots) from being confidently deployed in real-world situations
(e.g., post-disaster rescue). Instead of attempting to diagnose the failures,
robots could adapt by trial-and-error in order to be able to complete their
tasks. In this situation, damage recovery can be seen as a Reinforcement
Learning (RL) problem. However, the best RL algorithms for robotics require the
robot and the environment to be reset to an initial state after each episode,
that is, the robot is not learning autonomously. In addition, most of the RL
methods for robotics do not scale well with complex robots (e.g., walking
robots) and either cannot be used at all or take too long to converge to a
solution (e.g., hours of learning). In this paper, we introduce a novel
learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks
the complexity by pre-generating hundreds of possible behaviors with a dynamics
simulator of the intact robot, and (2) allows complex robots to quickly recover
from damage while completing their tasks and taking the environment into
account. We evaluate our algorithm on a simulated wheeled robot, a simulated
six-legged robot, and a real six-legged walking robot that are damaged in
several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and
whose objective is to reach a sequence of targets in an arena. Our experiments
show that the robots can recover most of their locomotion abilities in an
environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at
https://youtu.be/IqtyHFrb3BU, code at
https://github.com/resibots/chatzilygeroudis_2018_rt
Learning to Walk Autonomously via Reset-Free Quality-Diversity
Quality-Diversity (QD) algorithms can discover large and complex behavioural
repertoires consisting of both diverse and high-performing skills. However, the
generation of behavioural repertoires has mainly been limited to simulation
environments instead of real-world learning. This is because existing QD
algorithms need large numbers of evaluations as well as episodic resets, which
require manual human supervision and interventions. This paper proposes
Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous
learning for robotics in open-ended environments. We build on Dynamics-Aware
Quality-Diversity (DA-QD) and introduce a behaviour selection policy that
leverages the diversity of the imagined repertoire and environmental
information to intelligently select of behaviours that can act as automatic
resets. We demonstrate this through a task of learning to walk within defined
training zones with obstacles. Our experiments show that we can learn full
repertoires of legged locomotion controllers autonomously without manual resets
with high sample efficiency in spite of harsh safety constraints. Finally,
using an ablation of different target objectives, we show that it is important
for RF-QD to have diverse types solutions available for the behaviour selection
policy over solutions optimised with a specific objective. Videos and code
available at https://sites.google.com/view/rf-qd
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
One of the most interesting features of Bayesian optimization for direct
policy search is that it can leverage priors (e.g., from simulation or from
previous tasks) to accelerate learning on a robot. In this paper, we are
interested in situations for which several priors exist but we do not know in
advance which one fits best the current situation. We tackle this problem by
introducing a novel acquisition function, called Most Likely Expected
Improvement (MLEI), that combines the likelihood of the priors and the expected
improvement. We evaluate this new acquisition function on a transfer learning
task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has
to learn to walk on flat ground and on stairs, with priors corresponding to
different stairs and different kinds of damages. Our results show that MLEI
effectively identifies and exploits the priors, even when there is no obvious
match between the current situations and the priors.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 1 algorithm; Video at
https://youtu.be/xo8mUIZTvNE ; Spotlight ICRA presentation
https://youtu.be/iiVaV-U6Kq
FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing
We present a system that enables an autonomous small-scale RC car to drive
aggressively from visual observations using reinforcement learning (RL). Our
system, FastRLAP (faster lap), trains autonomously in the real world, without
human interventions, and without requiring any simulation or expert
demonstrations. Our system integrates a number of important components to make
this possible: we initialize the representations for the RL policy and value
function from a large prior dataset of other robots navigating in other
environments (at low speed), which provides a navigation-relevant
representation. From here, a sample-efficient online RL method uses a single
low-speed user-provided demonstration to determine the desired driving course,
extracts a set of navigational checkpoints, and autonomously practices driving
through these checkpoints, resetting automatically on collision or failure.
Perhaps surprisingly, we find that with appropriate initialization and choice
of algorithm, our system can learn to drive over a variety of racing courses
with less than 20 minutes of online training. The resulting policies exhibit
emergent aggressive driving skills, such as timing braking and acceleration
around turns and avoiding areas which impede the robot's motion, approaching
the performance of a human driver using a similar first-person interface over
the course of training
Damage recovery for robot controllers and simulators evolved using bootstrapped neuro-simulation
Robots are becoming increasingly complex. This has made manually designing the software responsible for controlling these robots (controllers) challenging, leading to the creation of the field of evolutionary robotics (ER). The ER approach aims to automatically evolve robot controllers and morphologies by utilising concepts from biological evolution. ER techniques use evolutionary algorithms (EA) to evolve populations of controllers - a process that requires the evaluation of a large number of controllers. Performing these evaluations on a real-world robot is both infeasibly time-consuming and poses the risk of damage to the robot. Simulators present a solution to the issue by allowing the evaluation of controllers to take place on a virtual robot. Traditional methods of controller evolution in simulation encounter two major issues. Firstly, physics simulators are complex to create and are often very computationally expensive. Secondly, the reality gap is encountered when controllers are evolved in simulators that are unable to simulate the real world well enough due to implications or small inaccuracies in the simulation, which together cause controllers in the simulation to be unable to transfer effectively to reality. Bootstrapped Neuro-Simulation (BNS) is an ER algorithm that aims to address the issues inherent with the use of simulators. The algorithm concurrently creates a simulator and evolves a population of controllers. The process starts with an initially random population of controllers and an untrained simulator neural network (SNN), a type of robot simulator which utilises artificial neural networks (ANNs) to simulate a robot's behaviour. Controllers are then continually selected for evaluation in the real world, and the data from these real-world evaluations is used to train the controller-evaluation SNN. BNS is a relatively new algorithm that has not yet been explored in depth. An investigation was, therefore, conducted into BNS's ability to evolve closed-loop controllers. BNS was successful in evolving such controllers, and various adaptations to the algorithm were investigated for their ability to improve the evolution of closed-loop controllers. In addition, the factors which had the greatest impact on BNS's effectiveness were reported upon. Damage recovery is an area that has been the focus of a great deal of research. This is because the progression of the field of robotics means that robots no longer operate only in the safe environments that they once did. Robots are now put to use in areas as inaccessible as the surface of Mars, where repairs by a human are impossible. Various methods of damage recovery have previously been proposed and evaluated, but none focused on BNS as a method of damage recovery. In this research, it was hypothesised that BNS's constantly learning nature would allow it to recover from damage, as it would continue to use new information about the state of the real robot to evolve new controllers capable of functioning in the damaged robot. BNS was found to possess the hypothesised damage recovery ability. The algorithm's evaluation was carried out through the evolution of controllers for simple navigation and light-following tasks for a wheeled robot, as well as a locomotion task for a complex legged robot. Various adaptations to the algorithm were then evaluated through extensive parameter investigations in simulation, showing varying levels of effectiveness. These results were further confirmed through evaluation of the adaptations and effective parameter values in real-world evaluations on a real robot. Both a simple and more complex robot morphology were investigated
- …