3,041 research outputs found
Online hyper-evolution of controllers in multirobot systems
In this paper, we introduce online hyper-evolution (OHE) to accelerate and increase the performance of online evolution of robotic controllers. Robots executing OHE use the different sources of feedback information traditionally associated with controller evaluation to find effective evolutionary algorithms and controllers online during task execution. We present two approaches: OHE-fitness, which uses the fitness score of controllers as the criterion to select promising algorithms over time, and OHE-diversity, which relies on the behavioural diversity of controllers for algorithm selection. Both OHE-fitness and OHE-diversity are distributed across groups of robots that evolve in parallel. We assess the performance of OHE-fitness and of OHE-diversity in two foraging tasks with differing complexity, and in five configurations of a dynamic phototaxis task with varying evolutionary pressures. Results show that our OHE approaches: (i) outperform multiple state-of-the-art algorithms as they facilitate controllers with superior performance and faster evolution of solutions, and (ii) can increase effectiveness at different stages of evolution by combining the benefits of multiple algorithms over time. Overall, our study shows that OHE is an effective new paradigm to the synthesis of controllers for robots.info:eu-repo/semantics/acceptedVersio
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
Meta Reinforcement Learning with Latent Variable Gaussian Processes
Learning from small data sets is critical in many practical applications
where data collection is time consuming or expensive, e.g., robotics, animal
experiments or drug design. Meta learning is one way to increase the data
efficiency of learning algorithms by generalizing learned concepts from a set
of training tasks to unseen, but related, tasks. Often, this relationship
between tasks is hard coded or relies in some other way on human expertise. In
this paper, we frame meta learning as a hierarchical latent variable model and
infer the relationship between tasks automatically from data. We apply our
framework in a model-based reinforcement learning setting and show that our
meta-learning model effectively generalizes to novel tasks by identifying how
new tasks relate to prior ones from minimal data. This results in up to a 60%
reduction in the average interaction time needed to solve tasks compared to
strong baselines.Comment: 11 pages, 7 figure
Evolutionary online behaviour learning and adaptation in real robots
Online evolution of behavioural control on real robots is an open-ended approach to autonomous learning and adaptation: robots have the potential to automatically learn new tasks and to adapt to changes in environmental conditions, or to failures in sensors and/or actuators. However, studies have so far almost exclusively been carried out in simulation because evolution in real hardware has required several days or weeks to produce capable robots. In this article, we successfully evolve neural network-based controllers in real robotic hardware to solve two single-robot tasks and one collective robotics task. Controllers are evolved either from random solutions or from solutions pre-evolved in simulation. In all cases, capable solutions are found in a timely manner (1 h or less). Results show that more accurate simulations may lead to higher-performing controllers, and that completing the optimization process in real robots is meaningful, even if solutions found in simulation differ from solutions in reality. We furthermore demonstrate for the first time the adaptive capabilities of online evolution in real robotic hardware, including robots able to overcome faults injected in the motors of multiple units simultaneously, and to modify their behaviour in response to changes in the task requirements. We conclude by assessing the contribution of each algorithmic component on the performance of the underlying evolutionary algorithm.info:eu-repo/semantics/publishedVersio
Active Inference for Integrated State-Estimation, Control, and Learning
This work presents an approach for control, state-estimation and learning
model (hyper)parameters for robotic manipulators. It is based on the active
inference framework, prominent in computational neuroscience as a theory of the
brain, where behaviour arises from minimizing variational free-energy. The
robotic manipulator shows adaptive and robust behaviour compared to
state-of-the-art methods. Additionally, we show the exact relationship to
classic methods such as PID control. Finally, we show that by learning a
temporal parameter and model variances, our approach can deal with unmodelled
dynamics, damps oscillations, and is robust against disturbances and poor
initial parameters. The approach is validated on the `Franka Emika Panda' 7 DoF
manipulator.Comment: 7 pages, 6 figures, accepted for presentation at the International
Conference on Robotics and Automation (ICRA) 202
- …