25 research outputs found
Behavioral repertoire learning in robotics
Behavioral Repertoire Learning in Robotics Antoine Cully ISIR, Université Pierre et Marie Curie-Paris 6, CNRS UMR 7222 4 place Jussieu, F-75252, Paris Cedex 05, France [email protected] Jean-Baptiste Mouret ISIR, Université Pierre et Marie Curie-Paris 6, CNRS UMR 7222 4 place Jussieu, F-75252, Paris Cedex 05, France [email protected] ABSTRACT Learning in robotics typically involves choosing a simple goal (e.g. walking) and assessing the performance of each con- troller with regard to this task (e.g. walking speed). How- ever, learning advanced, input-driven controllers (e.g. walk- ing in each direction) requires testing each controller on a large sample of the possible input signals. This costly pro- cess makes difficult to learn useful low-level controllers in robotics. Here we introduce BR-Evolution, a new evolutionary learn- ing technique that generates a behavioral repertoire by tak- ing advantage of the candidate solutions that are usually discarded. Instead of evolving a single, general controller, BR-evolution thus evolves a collection of simple controllers, one for each variant of the target behavior; to distinguish similar controllers, it uses a performance objective that al- lows it to produce a collection of diverse but high-performing behaviors. We evaluated this new technique by evolving gait controllers for a simulated hexapod robot. Results show that a single run of the EA quickly finds a collection of controllers that allows the robot to reach each point of the reachable space. Overall, BR-Evolution opens a new kind of learning algorithm that simultaneously optimizes all the achievable behaviors of a robot
Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation
Unlike unsupervised approaches such as autoencoders that learn to reconstruct
their inputs, this paper introduces an alternative approach to unsupervised
feature learning called divergent discriminative feature accumulation (DDFA)
that instead continually accumulates features that make novel discriminations
among the training set. Thus DDFA features are inherently discriminative from
the start even though they are trained without knowledge of the ultimate
classification problem. Interestingly, DDFA also continues to add new features
indefinitely (so it does not depend on a hidden layer size), is not based on
minimizing error, and is inherently divergent instead of convergent, thereby
providing a unique direction of research for unsupervised feature learning. In
this paper the quality of its learned features is demonstrated on the MNIST
dataset, where its performance confirms that indeed DDFA is a viable technique
for learning useful features.Comment: Corrected citation formattin
Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning in robotics are
model-based policy search algorithms, which alternate between learning a
dynamical model of the robot and optimizing a policy to maximize the expected
return given the model and its uncertainties. Among the few proposed
approaches, the recently introduced Black-DROPS algorithm exploits a black-box
optimization algorithm to achieve both high data-efficiency and good
computation times when several cores are used; nevertheless, like all
model-based policy search approaches, Black-DROPS does not scale to high
dimensional state/action spaces. In this paper, we introduce a new model
learning procedure in Black-DROPS that leverages parameterized black-box priors
to (1) scale up to high-dimensional systems, and (2) be robust to large
inaccuracies of the prior information. We demonstrate the effectiveness of our
approach with the "pendubot" swing-up task in simulation and with a physical
hexapod robot (48D state space, 18D action space) that has to walk forward as
fast as possible. The results show that our new algorithm is more
data-efficient than previous model-based policy search algorithms (with and
without priors) and that it can allow a physical 6-legged robot to learn new
gaits in only 16 to 30 seconds of interaction time.Comment: Accepted at ICRA 2018; 8 pages, 4 figures, 2 algorithms, 1 table;
Video at https://youtu.be/HFkZkhGGzTo ; Spotlight ICRA presentation at
https://youtu.be/_MZYDhfWeL
Using Centroidal Voronoi Tessellations to Scale Up the Multi-dimensional Archive of Phenotypic Elites Algorithm
The recently introduced Multi-dimensional Archive of Phenotypic Elites
(MAP-Elites) is an evolutionary algorithm capable of producing a large archive
of diverse, high-performing solutions in a single run. It works by discretizing
a continuous feature space into unique regions according to the desired
discretization per dimension. While simple, this algorithm has a main drawback:
it cannot scale to high-dimensional feature spaces since the number of regions
increase exponentially with the number of dimensions. In this paper, we address
this limitation by introducing a simple extension of MAP-Elites that has a
constant, pre-defined number of regions irrespective of the dimensionality of
the feature space. Our main insight is that methods from computational geometry
could partition a high-dimensional space into well-spread geometric regions. In
particular, our algorithm uses a centroidal Voronoi tessellation (CVT) to
divide the feature space into a desired number of regions; it then places every
generated individual in its closest region, replacing a less fit one if the
region is already occupied. We demonstrate the effectiveness of the new
"CVT-MAP-Elites" algorithm in high-dimensional feature spaces through
comparisons against MAP-Elites in maze navigation and hexapod locomotion tasks
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
Evolution strategies (ES) are a family of black-box optimization algorithms
able to train deep neural networks roughly as well as Q-learning and policy
gradient methods on challenging deep reinforcement learning (RL) problems, but
are much faster (e.g. hours vs. days) because they parallelize better. However,
many RL problems require directed exploration because they have reward
functions that are sparse or deceptive (i.e. contain local optima), and it is
unknown how to encourage such exploration with ES. Here we show that algorithms
that have been invented to promote directed exploration in small-scale evolved
neural networks via populations of exploring agents, specifically novelty
search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to
improve its performance on sparse or deceptive deep RL tasks, while retaining
scalability. Our experiments confirm that the resultant new algorithms, NS-ES
and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encountered by ES
to achieve higher performance on Atari and simulated robots learning to walk
around a deceptive trap. This paper thus introduces a family of fast, scalable
algorithms for reinforcement learning that are capable of directed exploration.
It also adds this new family of exploration algorithms to the RL toolbox and
raises the interesting possibility that analogous algorithms with multiple
simultaneous paths of exploration might also combine well with existing RL
algorithms outside ES
Discovering Unsupervised Behaviours from Full-State Trajectories
Improving open-ended learning capabilities is a promising approach to enable
robots to face the unbounded complexity of the real-world. Among existing
methods, the ability of Quality-Diversity algorithms to generate large
collections of diverse and high-performing skills is instrumental in this
context. However, most of those algorithms rely on a hand-coded behavioural
descriptor to characterise the diversity, hence requiring prior knowledge about
the considered tasks. In this work, we propose an additional analysis of
Autonomous Robots Realising their Abilities; a Quality-Diversity algorithm that
autonomously finds behavioural characterisations. We evaluate this approach on
a simulated robotic environment, where the robot has to autonomously discover
its abilities from its full-state trajectories. All algorithms were applied to
three tasks: navigation, moving forward with a high velocity, and performing
half-rolls. The experimental results show that the algorithm under study
discovers autonomously collections of solutions that are diverse with respect
to all tasks. More specifically, the analysed approach autonomously finds
policies that make the robot move to diverse positions, but also utilise its
legs in diverse ways, and even perform half-rolls.Comment: Published at the Workshop on Agent Learning in Open-Endedness (ALOE)
at ICLR 2022. arXiv admin note: substantial text overlap with
arXiv:2204.0982
Efficient Learning of Locomotion Skills through the Discovery of Diverse Environmental Trajectory Generator Priors
Data-driven learning based methods have recently been particularly successful
at learning robust locomotion controllers for a variety of unstructured
terrains. Prior work has shown that incorporating good locomotion priors in the
form of trajectory generators (TGs) is effective at efficiently learning
complex locomotion skills. However, defining a good, single TG as
tasks/environments become increasingly more complex remains a challenging
problem as it requires extensive tuning and risks reducing the effectiveness of
the prior. In this paper, we present Evolved Environmental Trajectory
Generators (EETG), a method that learns a diverse set of specialised locomotion
priors using Quality-Diversity algorithms while maintaining a single policy
within the Policies Modulating TG (PMTG) architecture. The results demonstrate
that EETG enables a quadruped robot to successfully traverse a wide range of
environments, such as slopes, stairs, rough terrain, and balance beams. Our
experiments show that learning a diverse set of specialized TG priors is
significantly (5 times) more efficient than using a single, fixed prior when
dealing with a wide range of environments