223 research outputs found
QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning
We propose a novel reinforcement learning algorithm,QD-RL, that incorporates
the strengths of off-policy RL algorithms into Quality Diversity (QD)
approaches. Quality-Diversity methods contribute structural biases by
decoupling the search for diversity from the search for high return, resulting
in efficient management of the exploration-exploitation trade-off. However,
these approaches generally suffer from sample inefficiency as they call upon
evolutionary techniques. QD-RL removes this limitation by relying on off-policy
RL algorithms. More precisely, we train a population of off-policy deep RL
agents to simultaneously maximize diversity inside the population and the
return of the agents. QD-RL selects agents from the diversity-return Pareto
Front, resulting in stable and efficient population updates. Our experiments on
the Ant-Maze environment show that QD-RL can solve challenging exploration and
control problems with deceptive rewards while being more than 15 times more
sample efficient than its evolutionary counterparts
Gradient-Informed Quality Diversity for the Illumination of Discrete Spaces
Quality Diversity (QD) algorithms have been proposed to search for a large
collection of both diverse and high-performing solutions instead of a single
set of local optima. While early QD algorithms view the objective and
descriptor functions as black-box functions, novel tools have been introduced
to use gradient information to accelerate the search and improve overall
performance of those algorithms over continuous input spaces. However a broad
range of applications involve discrete spaces, such as drug discovery or image
generation. Exploring those spaces is challenging as they are combinatorially
large and gradients cannot be used in the same manner as in continuous spaces.
We introduce map-elites with a Gradient-Informed Discrete Emitter (ME-GIDE),
which extends QD optimisation with differentiable functions over discrete
search spaces. ME-GIDE leverages the gradient information of the objective and
descriptor functions with respect to its discrete inputs to propose
gradient-informed updates that guide the search towards a diverse set of high
quality solutions. We evaluate our method on challenging benchmarks including
protein design and discrete latent space illumination and find that our method
outperforms state-of-the-art QD algorithms in all benchmarks
PASTA: Pretrained Action-State Transformer Agents
Self-supervised learning has brought about a revolutionary paradigm shift in
various computing domains, including NLP, vision, and biology. Recent
approaches involve pre-training transformer models on vast amounts of unlabeled
data, serving as a starting point for efficiently solving downstream tasks. In
the realm of reinforcement learning, researchers have recently adapted these
approaches by developing models pre-trained on expert trajectories, enabling
them to address a wide range of tasks, from robotics to recommendation systems.
However, existing methods mostly rely on intricate pre-training objectives
tailored to specific downstream applications. This paper presents a
comprehensive investigation of models we refer to as Pretrained Action-State
Transformer Agents (PASTA). Our study uses a unified methodology and covers an
extensive set of general downstream tasks including behavioral cloning, offline
RL, sensor failure robustness, and dynamics change adaptation. Our goal is to
systematically compare various design choices and provide valuable insights to
practitioners for building robust models. Key highlights of our study include
tokenization at the action and state component level, using fundamental
pre-training objectives like next token prediction, training models across
diverse domains simultaneously, and using parameter efficient fine-tuning
(PEFT). The developed models in our study contain fewer than 10 million
parameters and the application of PEFT enables fine-tuning of fewer than 10,000
parameters during downstream adaptation, allowing a broad community to use
these models and reproduce our experiments. We hope that this study will
encourage further research into the use of transformers with first-principles
design choices to represent RL trajectories and contribute to robust policy
learning
The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers
In the context of neuroevolution, Quality-Diversity algorithms have proven
effective in generating repertoires of diverse and efficient policies by
relying on the definition of a behavior space. A natural goal induced by the
creation of such a repertoire is trying to achieve behaviors on demand, which
can be done by running the corresponding policy from the repertoire. However,
in uncertain environments, two problems arise. First, policies can lack
robustness and repeatability, meaning that multiple episodes under slightly
different conditions often result in very different behaviors. Second, due to
the discrete nature of the repertoire, solutions vary discontinuously. Here we
present a new approach to achieve behavior-conditioned trajectory generation
based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains
the selection of solutions to those that are the most consistent in the
behavior space. Second, the Quality-Diversity Transformer (QDT), a
Transformer-based model conditioned on continuous behavior descriptors, which
trains on a dataset generated by policies from a ME-LS repertoire and learns to
autoregressively generate sequences of actions that achieve target behaviors.
Results show that ME-LS produces consistent and robust policies, and that its
combination with the QDT yields a single policy capable of achieving diverse
behaviors on demand with high accuracy.Comment: 10+7 page
Information processing in long delay memory-guided saccades: further insights from TMS
The performance of memory-guided saccades with two different delays (3s and 30s of memorisation) was studied in eight subjects. Single pulse transcranial magnetic stimulation (TMS) was applied simultaneously over the left and right dorsolateral prefrontal cortex (DLPFC) 1s after target presentation. In both delays, stimulation significantly increased the percentage of error in amplitude of memory-guided saccades. Furthermore, the interfering effect of TMS was significantly higher in the short delay compared to that of the long delay paradigm. The results are discussed in the context of a mixed model of spatial working memory control including two components: First, serial information processing with a predominant role of the DLPFC during the early period of memorisation and, second, parallel information processing, which is independent from the DLPFC, operating during longer delay
« Avancer par nappes » : de l’histoire de la sidérurgie à l’histoire de l’emballage, en passant par l’archéologie industrielle.
Bonjour Denis Woronoff. C’est un plaisir pour nous de vous interroger sur vos travaux vos recherches. Pouvez-vous nous dire tout d’abord ce qui vous a amené à devenir historien ? J’ai toujours été historien. Si loin que je remonte dans mon enfance, c’était la seule chose qui me mobilisait. Le fait d’avoir un nom russe et un père récent naturalisé m’a sans doute poussé à savoir d’où je venais. Un moment, j’ai rêvé être journaliste, puis j’ai failli être philosophe, impressionné à l’Ecole alsac..
Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems
A fascinating aspect of nature lies in its ability to produce a collection of
organisms that are all high-performing in their niche. Quality-Diversity (QD)
methods are evolutionary algorithms inspired by this observation, that obtained
great results in many applications, from wing design to robot adaptation.
Recently, several works demonstrated that these methods could be applied to
perform neuro-evolution to solve control problems in large search spaces. In
such problems, diversity can be a target in itself. Diversity can also be a way
to enhance exploration in tasks exhibiting deceptive reward signals. While the
first aspect has been studied in depth in the QD community, the latter remains
scarcer in the literature. Exploration is at the heart of several domains
trying to solve control problems such as Reinforcement Learning and QD methods
are promising candidates to overcome the challenges associated. Therefore, we
believe that standardized benchmarks exhibiting control problems in high
dimension with exploration difficulties are of interest to the QD community. In
this paper, we highlight three candidate benchmarks and explain why they appear
relevant for systematic evaluation of QD algorithms. We also provide
open-source implementations in Jax allowing practitioners to run fast and
numerous experiments on few compute resources.Comment: GECCO 2022 Workshop on Quality Diversity Algorithm Benchmark
- …