15 research outputs found
The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers
In the context of neuroevolution, Quality-Diversity algorithms have proven
effective in generating repertoires of diverse and efficient policies by
relying on the definition of a behavior space. A natural goal induced by the
creation of such a repertoire is trying to achieve behaviors on demand, which
can be done by running the corresponding policy from the repertoire. However,
in uncertain environments, two problems arise. First, policies can lack
robustness and repeatability, meaning that multiple episodes under slightly
different conditions often result in very different behaviors. Second, due to
the discrete nature of the repertoire, solutions vary discontinuously. Here we
present a new approach to achieve behavior-conditioned trajectory generation
based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains
the selection of solutions to those that are the most consistent in the
behavior space. Second, the Quality-Diversity Transformer (QDT), a
Transformer-based model conditioned on continuous behavior descriptors, which
trains on a dataset generated by policies from a ME-LS repertoire and learns to
autoregressively generate sequences of actions that achieve target behaviors.
Results show that ME-LS produces consistent and robust policies, and that its
combination with the QDT yields a single policy capable of achieving diverse
behaviors on demand with high accuracy.Comment: 10+7 page
Combinatorial Optimization with Policy Adaptation using Latent Space Search
Combinatorial Optimization underpins many real-world applications and yet,
designing performant algorithms to solve these complex, typically NP-hard,
problems remains a significant research challenge. Reinforcement Learning (RL)
provides a versatile framework for designing heuristics across a broad spectrum
of problem domains. However, despite notable progress, RL has not yet
supplanted industrial solvers as the go-to solution. Current approaches
emphasize pre-training heuristics that construct solutions but often rely on
search procedures with limited variance, such as stochastically sampling
numerous solutions from a single policy or employing computationally expensive
fine-tuning of the policy on individual problem instances. Building on the
intuition that performant search at inference time should be anticipated during
pre-training, we propose COMPASS, a novel RL approach that parameterizes a
distribution of diverse and specialized policies conditioned on a continuous
latent space. We evaluate COMPASS across three canonical problems - Travelling
Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and
demonstrate that our search strategy (i) outperforms state-of-the-art
approaches on 11 standard benchmarking tasks and (ii) generalizes better,
surpassing all other approaches on a set of 18 procedurally transformed
instance distributions.Comment: Accepted at Neurips 2023. Small updates in results reporte
Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems
A fascinating aspect of nature lies in its ability to produce a collection of
organisms that are all high-performing in their niche. Quality-Diversity (QD)
methods are evolutionary algorithms inspired by this observation, that obtained
great results in many applications, from wing design to robot adaptation.
Recently, several works demonstrated that these methods could be applied to
perform neuro-evolution to solve control problems in large search spaces. In
such problems, diversity can be a target in itself. Diversity can also be a way
to enhance exploration in tasks exhibiting deceptive reward signals. While the
first aspect has been studied in depth in the QD community, the latter remains
scarcer in the literature. Exploration is at the heart of several domains
trying to solve control problems such as Reinforcement Learning and QD methods
are promising candidates to overcome the challenges associated. Therefore, we
believe that standardized benchmarks exhibiting control problems in high
dimension with exploration difficulties are of interest to the QD community. In
this paper, we highlight three candidate benchmarks and explain why they appear
relevant for systematic evaluation of QD algorithms. We also provide
open-source implementations in Jax allowing practitioners to run fast and
numerous experiments on few compute resources.Comment: GECCO 2022 Workshop on Quality Diversity Algorithm Benchmark
Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for
training neural policies to solve complex control tasks. However, these
policies tend to be overfit to the exact specifications of the task and
environment they were trained on, and thus do not perform well when conditions
deviate slightly or when composed hierarchically to solve even more complex
tasks. Recent work has shown that training a mixture of policies, as opposed to
a single one, that are driven to explore different regions of the state-action
space can address this shortcoming by generating a diverse set of behaviors,
referred to as skills, that can be collectively used to great effect in
adaptation tasks or for hierarchical planning. This is typically realized by
including a diversity term - often derived from information theory - in the
objective function optimized by RL. However these approaches often require
careful hyperparameter tuning to be effective. In this work, we demonstrate
that less widely-used neuroevolution methods, specifically Quality Diversity
(QD), are a competitive alternative to information-theory-augmented RL for
skill discovery. Through an extensive empirical evaluation comparing eight
state-of-the-art methods on the basis of (i) metrics directly evaluating the
skills' diversity, (ii) the skills' performance on adaptation tasks, and (iii)
the skills' performance when used as primitives for hierarchical planning; QD
methods are found to provide equal, and sometimes improved, performance whilst
being less sensitive to hyperparameters and more scalable. As no single method
is found to provide near-optimal performance across all environments, there is
a rich scope for further research which we support by proposing future
directions and providing optimized open-source implementations
Base-editing-mediated dissection of a γ-globin cis-regulatory element for the therapeutic reactivation of fetal hemoglobin expression
: Sickle cell disease and β-thalassemia affect the production of the adult β-hemoglobin chain. The clinical severity is lessened by mutations that cause fetal γ-globin expression in adult life (i.e., the hereditary persistence of fetal hemoglobin). Mutations clustering ~200 nucleotides upstream of the HBG transcriptional start sites either reduce binding of the LRF repressor or recruit the KLF1 activator. Here, we use base editing to generate a variety of mutations in the -200 region of the HBG promoters, including potent combinations of four to eight γ-globin-inducing mutations. Editing of patient hematopoietic stem/progenitor cells is safe, leads to fetal hemoglobin reactivation and rescues the pathological phenotype. Creation of a KLF1 activator binding site is the most potent strategy - even in long-term repopulating hematopoietic stem/progenitor cells. Compared with a Cas9-nuclease approach, base editing avoids the generation of insertions, deletions and large genomic rearrangements and results in higher γ-globin levels. Our results demonstrate that base editing of HBG promoters is a safe, universal strategy for treating β-hemoglobinopathies
SeaPearl: A Constraint Programming Solver Guided by Reinforcement Learning
The design of efficient and generic algorithms for solving combinatorial
optimization problems has been an active field of research for many years.
Standard exact solving approaches are based on a clever and complete
enumeration of the solution set. A critical and non-trivial design choice with
such methods is the branching strategy, directing how the search is performed.
The last decade has shown an increasing interest in the design of machine
learning-based heuristics to solve combinatorial optimization problems. The
goal is to leverage knowledge from historical data to solve similar new
instances of a problem. Used alone, such heuristics are only able to provide
approximate solutions efficiently, but cannot prove optimality nor bounds on
their solution. Recent works have shown that reinforcement learning can be
successfully used for driving the search phase of constraint programming (CP)
solvers. However, it has also been shown that this hybridization is challenging
to build, as standard CP frameworks do not natively include machine learning
mechanisms, leading to some sources of inefficiencies. This paper presents the
proof of concept for SeaPearl, a new CP solver implemented in Julia, that
supports machine learning routines in order to learn branching decisions using
reinforcement learning. Support for modeling the learning component is also
provided. We illustrate the modeling and solution performance of this new
solver on two problems. Although not yet competitive with industrial solvers,
SeaPearl aims to provide a flexible and open-source framework in order to
facilitate future research in the hybridization of constraint programming and
machine learning
Novel lentiviral vectors for gene therapy of sickle cell disease combining gene addition and gene silencing strategies
Sickle cell disease (SCD) is due to a mutation in the β-globin gene causing production of the toxic sickle hemoglobin (HbS; α2βS2). Transplantation of autologous hematopoietic stem and progenitor cells (HSPCs) transduced with lentiviral vectors (LVs) expressing an anti-sickling β-globin (βAS) is a promising treatment; however, it is only partially effective, and patients still present elevated HbS levels. Here, we developed a bifunctional LV expressing βAS3-globin and an artificial microRNA (amiRNA) specifically downregulating βS-globin expression with the aim of reducing HbS levels and favoring βAS3 incorporation into Hb tetramers. Efficient transduction of SCD HSPCs by the bifunctional LV led to a substantial decrease of βS-globin transcripts in HSPC-derived erythroid cells, a significant reduction of HbS+ red cells, and effective correction of the sickling phenotype, outperforming βAS gene addition and BCL11A gene silencing strategies. The bifunctional LV showed a standard integration profile, and neither HSPC viability, engraftment, and multilineage differentiation nor the erythroid transcriptome and miRNAome were affected by the treatment, confirming the safety of this therapeutic strategy. In conclusion, the combination of gene addition and gene silencing strategies can improve the efficacy of current LV-based therapeutic approaches without increasing the mutagenic vector load, thus representing a novel treatment for SCD
Combination of lentiviral and genome editing technologies for the treatment of sickle cell disease
International audienc
Editing a γ-globin repressor binding site restores fetal hemoglobin synthesis and corrects the sickle cell disease phenotype
Sickle cell disease (SCD) is caused by a single amino acid change in the adult hemoglobin (Hb) β chain that causes Hb polymerization and red blood cell (RBC) sickling. The co-inheritance of mutations causing fetal γ-globin production in adult life hereditary persistence of fetal Hb (HPFH) reduces the clinical severity of SCD. HPFH mutations in the HBG γ-globin promoters disrupt binding sites for the repressors BCL11A and LRF. We used CRISPR-Cas9 to mimic HPFH mutations in the HBG promoters by generating insertions and deletions, leading to disruption of known and putative repressor binding sites. Editing of the LRF-binding site in patient-derived hematopoietic stem/progenitor cells (HSPCs) resulted in γ-globin derepression and correction of the sickling phenotype. Xenotransplantation of HSPCs treated with gRNAs targeting the LRF-binding site showed a high editing efficiency in repopulating HSPCs. This study identifies the LRF-binding site as a potent target for genome-editing treatment of SCD