9,025 research outputs found
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
Quantifying the Evolutionary Self Structuring of Embodied Cognitive Networks
We outline a possible theoretical framework for the quantitative modeling of
networked embodied cognitive systems. We notice that: 1) information self
structuring through sensory-motor coordination does not deterministically occur
in Rn vector space, a generic multivariable space, but in SE(3), the group
structure of the possible motions of a body in space; 2) it happens in a
stochastic open ended environment. These observations may simplify, at the
price of a certain abstraction, the modeling and the design of self
organization processes based on the maximization of some informational
measures, such as mutual information. Furthermore, by providing closed form or
computationally lighter algorithms, it may significantly reduce the
computational burden of their implementation. We propose a modeling framework
which aims to give new tools for the design of networks of new artificial self
organizing, embodied and intelligent agents and the reverse engineering of
natural ones. At this point, it represents much a theoretical conjecture and it
has still to be experimentally verified whether this model will be useful in
practice.
Interactive Co-Design of Form and Function for Legged Robots using the Adjoint Method
Our goal is to make robotics more accessible to casual users by reducing the
domain knowledge required in designing and building robots. Towards this goal,
we present an interactive computational design system that enables users to
design legged robots with desired morphologies and behaviors by specifying
higher level descriptions. The core of our method is a design optimization
technique that reasons about the structure, and motion of a robot in coupled
manner in order to achieve user-specified robot behavior, and performance. We
are inspired by the recent works that also aim to jointly optimize robot's form
and function. However, through efficient computation of necessary design
changes, our approach enables us to keep user-in-the-loop for interactive
applications. We evaluate our system in simulation by automatically improving
robot designs for multiple scenarios. Starting with initial user designs that
are physically infeasible or inadequate to perform the user-desired task, we
show optimized designs that achieve user-specifications, all while ensuring an
interactive design flow.Comment: 8 pages; added link of the accompanying vide
Reset-free Trial-and-Error Learning for Robot Damage Recovery
The high probability of hardware failures prevents many advanced robots
(e.g., legged robots) from being confidently deployed in real-world situations
(e.g., post-disaster rescue). Instead of attempting to diagnose the failures,
robots could adapt by trial-and-error in order to be able to complete their
tasks. In this situation, damage recovery can be seen as a Reinforcement
Learning (RL) problem. However, the best RL algorithms for robotics require the
robot and the environment to be reset to an initial state after each episode,
that is, the robot is not learning autonomously. In addition, most of the RL
methods for robotics do not scale well with complex robots (e.g., walking
robots) and either cannot be used at all or take too long to converge to a
solution (e.g., hours of learning). In this paper, we introduce a novel
learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks
the complexity by pre-generating hundreds of possible behaviors with a dynamics
simulator of the intact robot, and (2) allows complex robots to quickly recover
from damage while completing their tasks and taking the environment into
account. We evaluate our algorithm on a simulated wheeled robot, a simulated
six-legged robot, and a real six-legged walking robot that are damaged in
several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and
whose objective is to reach a sequence of targets in an arena. Our experiments
show that the robots can recover most of their locomotion abilities in an
environment with obstacles, and without any human intervention.Comment: 18 pages, 16 figures, 3 tables, 6 pseudocodes/algorithms, video at
https://youtu.be/IqtyHFrb3BU, code at
https://github.com/resibots/chatzilygeroudis_2018_rt
Evolution of central pattern generators for the control of a five-link bipedal walking mechanism
Central pattern generators (CPGs), with a basis is neurophysiological
studies, are a type of neural network for the generation of rhythmic motion.
While CPGs are being increasingly used in robot control, most applications are
hand-tuned for a specific task and it is acknowledged in the field that generic
methods and design principles for creating individual networks for a given task
are lacking. This study presents an approach where the connectivity and
oscillatory parameters of a CPG network are determined by an evolutionary
algorithm with fitness evaluations in a realistic simulation with accurate
physics. We apply this technique to a five-link planar walking mechanism to
demonstrate its feasibility and performance. In addition, to see whether
results from simulation can be acceptably transferred to real robot hardware,
the best evolved CPG network is also tested on a real mechanism. Our results
also confirm that the biologically inspired CPG model is well suited for legged
locomotion, since a diverse manifestation of networks have been observed to
succeed in fitness simulations during evolution.Comment: 11 pages, 9 figures; substantial revision of content, organization,
and quantitative result
- …