1,991 research outputs found

    Reset-free Trial-and-Error Learning for Robot Damage Recovery

    Get PDF
    The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at https://youtu.be/IqtyHFrb3BU, code at https://github.com/resibots/chatzilygeroudis_2018_rt

    Learning to Walk Autonomously via Reset-Free Quality-Diversity

    Full text link
    Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd

    Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

    Get PDF
    The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table; Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at https://youtu.be/_MZYDhfWeL

    Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search

    Get PDF
    One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 1 algorithm; Video at https://youtu.be/xo8mUIZTvNE ; Spotlight ICRA presentation https://youtu.be/iiVaV-U6Kq

    FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

    Full text link
    We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training

    Damage recovery for robot controllers and simulators evolved using bootstrapped neuro-simulation

    Get PDF
    Robots are becoming increasingly complex. This has made manually designing the software responsible for controlling these robots (controllers) challenging, leading to the creation of the field of evolutionary robotics (ER). The ER approach aims to automatically evolve robot controllers and morphologies by utilising concepts from biological evolution. ER techniques use evolutionary algorithms (EA) to evolve populations of controllers - a process that requires the evaluation of a large number of controllers. Performing these evaluations on a real-world robot is both infeasibly time-consuming and poses the risk of damage to the robot. Simulators present a solution to the issue by allowing the evaluation of controllers to take place on a virtual robot. Traditional methods of controller evolution in simulation encounter two major issues. Firstly, physics simulators are complex to create and are often very computationally expensive. Secondly, the reality gap is encountered when controllers are evolved in simulators that are unable to simulate the real world well enough due to implications or small inaccuracies in the simulation, which together cause controllers in the simulation to be unable to transfer effectively to reality. Bootstrapped Neuro-Simulation (BNS) is an ER algorithm that aims to address the issues inherent with the use of simulators. The algorithm concurrently creates a simulator and evolves a population of controllers. The process starts with an initially random population of controllers and an untrained simulator neural network (SNN), a type of robot simulator which utilises artificial neural networks (ANNs) to simulate a robot's behaviour. Controllers are then continually selected for evaluation in the real world, and the data from these real-world evaluations is used to train the controller-evaluation SNN. BNS is a relatively new algorithm that has not yet been explored in depth. An investigation was, therefore, conducted into BNS's ability to evolve closed-loop controllers. BNS was successful in evolving such controllers, and various adaptations to the algorithm were investigated for their ability to improve the evolution of closed-loop controllers. In addition, the factors which had the greatest impact on BNS's effectiveness were reported upon. Damage recovery is an area that has been the focus of a great deal of research. This is because the progression of the field of robotics means that robots no longer operate only in the safe environments that they once did. Robots are now put to use in areas as inaccessible as the surface of Mars, where repairs by a human are impossible. Various methods of damage recovery have previously been proposed and evaluated, but none focused on BNS as a method of damage recovery. In this research, it was hypothesised that BNS's constantly learning nature would allow it to recover from damage, as it would continue to use new information about the state of the real robot to evolve new controllers capable of functioning in the damaged robot. BNS was found to possess the hypothesised damage recovery ability. The algorithm's evaluation was carried out through the evolution of controllers for simple navigation and light-following tasks for a wheeled robot, as well as a locomotion task for a complex legged robot. Various adaptations to the algorithm were then evaluated through extensive parameter investigations in simulation, showing varying levels of effectiveness. These results were further confirmed through evaluation of the adaptations and effective parameter values in real-world evaluations on a real robot. Both a simple and more complex robot morphology were investigated
    • …
    corecore