15 research outputs found

    The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

    Full text link
    In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.Comment: 10+7 page

    Combinatorial Optimization with Policy Adaptation using Latent Space Search

    Full text link
    Combinatorial Optimization underpins many real-world applications and yet, designing performant algorithms to solve these complex, typically NP-hard, problems remains a significant research challenge. Reinforcement Learning (RL) provides a versatile framework for designing heuristics across a broad spectrum of problem domains. However, despite notable progress, RL has not yet supplanted industrial solvers as the go-to solution. Current approaches emphasize pre-training heuristics that construct solutions but often rely on search procedures with limited variance, such as stochastically sampling numerous solutions from a single policy or employing computationally expensive fine-tuning of the policy on individual problem instances. Building on the intuition that performant search at inference time should be anticipated during pre-training, we propose COMPASS, a novel RL approach that parameterizes a distribution of diverse and specialized policies conditioned on a continuous latent space. We evaluate COMPASS across three canonical problems - Travelling Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and demonstrate that our search strategy (i) outperforms state-of-the-art approaches on 11 standard benchmarking tasks and (ii) generalizes better, surpassing all other approaches on a set of 18 procedurally transformed instance distributions.Comment: Accepted at Neurips 2023. Small updates in results reporte

    Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems

    Full text link
    A fascinating aspect of nature lies in its ability to produce a collection of organisms that are all high-performing in their niche. Quality-Diversity (QD) methods are evolutionary algorithms inspired by this observation, that obtained great results in many applications, from wing design to robot adaptation. Recently, several works demonstrated that these methods could be applied to perform neuro-evolution to solve control problems in large search spaces. In such problems, diversity can be a target in itself. Diversity can also be a way to enhance exploration in tasks exhibiting deceptive reward signals. While the first aspect has been studied in depth in the QD community, the latter remains scarcer in the literature. Exploration is at the heart of several domains trying to solve control problems such as Reinforcement Learning and QD methods are promising candidates to overcome the challenges associated. Therefore, we believe that standardized benchmarks exhibiting control problems in high dimension with exploration difficulties are of interest to the QD community. In this paper, we highlight three candidate benchmarks and explain why they appear relevant for systematic evaluation of QD algorithms. We also provide open-source implementations in Jax allowing practitioners to run fast and numerous experiments on few compute resources.Comment: GECCO 2022 Workshop on Quality Diversity Algorithm Benchmark

    Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

    Full text link
    Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. However, these policies tend to be overfit to the exact specifications of the task and environment they were trained on, and thus do not perform well when conditions deviate slightly or when composed hierarchically to solve even more complex tasks. Recent work has shown that training a mixture of policies, as opposed to a single one, that are driven to explore different regions of the state-action space can address this shortcoming by generating a diverse set of behaviors, referred to as skills, that can be collectively used to great effect in adaptation tasks or for hierarchical planning. This is typically realized by including a diversity term - often derived from information theory - in the objective function optimized by RL. However these approaches often require careful hyperparameter tuning to be effective. In this work, we demonstrate that less widely-used neuroevolution methods, specifically Quality Diversity (QD), are a competitive alternative to information-theory-augmented RL for skill discovery. Through an extensive empirical evaluation comparing eight state-of-the-art methods on the basis of (i) metrics directly evaluating the skills' diversity, (ii) the skills' performance on adaptation tasks, and (iii) the skills' performance when used as primitives for hierarchical planning; QD methods are found to provide equal, and sometimes improved, performance whilst being less sensitive to hyperparameters and more scalable. As no single method is found to provide near-optimal performance across all environments, there is a rich scope for further research which we support by proposing future directions and providing optimized open-source implementations

    Base-editing-mediated dissection of a γ-globin cis-regulatory element for the therapeutic reactivation of fetal hemoglobin expression

    Get PDF
    : Sickle cell disease and β-thalassemia affect the production of the adult β-hemoglobin chain. The clinical severity is lessened by mutations that cause fetal γ-globin expression in adult life (i.e., the hereditary persistence of fetal hemoglobin). Mutations clustering ~200 nucleotides upstream of the HBG transcriptional start sites either reduce binding of the LRF repressor or recruit the KLF1 activator. Here, we use base editing to generate a variety of mutations in the -200 region of the HBG promoters, including potent combinations of four to eight γ-globin-inducing mutations. Editing of patient hematopoietic stem/progenitor cells is safe, leads to fetal hemoglobin reactivation and rescues the pathological phenotype. Creation of a KLF1 activator binding site is the most potent strategy - even in long-term repopulating hematopoietic stem/progenitor cells. Compared with a Cas9-nuclease approach, base editing avoids the generation of insertions, deletions and large genomic rearrangements and results in higher γ-globin levels. Our results demonstrate that base editing of HBG promoters is a safe, universal strategy for treating β-hemoglobinopathies

    SeaPearl: A Constraint Programming Solver Guided by Reinforcement Learning

    No full text
    The design of efficient and generic algorithms for solving combinatorial optimization problems has been an active field of research for many years. Standard exact solving approaches are based on a clever and complete enumeration of the solution set. A critical and non-trivial design choice with such methods is the branching strategy, directing how the search is performed. The last decade has shown an increasing interest in the design of machine learning-based heuristics to solve combinatorial optimization problems. The goal is to leverage knowledge from historical data to solve similar new instances of a problem. Used alone, such heuristics are only able to provide approximate solutions efficiently, but cannot prove optimality nor bounds on their solution. Recent works have shown that reinforcement learning can be successfully used for driving the search phase of constraint programming (CP) solvers. However, it has also been shown that this hybridization is challenging to build, as standard CP frameworks do not natively include machine learning mechanisms, leading to some sources of inefficiencies. This paper presents the proof of concept for SeaPearl, a new CP solver implemented in Julia, that supports machine learning routines in order to learn branching decisions using reinforcement learning. Support for modeling the learning component is also provided. We illustrate the modeling and solution performance of this new solver on two problems. Although not yet competitive with industrial solvers, SeaPearl aims to provide a flexible and open-source framework in order to facilitate future research in the hybridization of constraint programming and machine learning

    Novel lentiviral vectors for gene therapy of sickle cell disease combining gene addition and gene silencing strategies

    No full text
    Sickle cell disease (SCD) is due to a mutation in the β-globin gene causing production of the toxic sickle hemoglobin (HbS; α2βS2). Transplantation of autologous hematopoietic stem and progenitor cells (HSPCs) transduced with lentiviral vectors (LVs) expressing an anti-sickling β-globin (βAS) is a promising treatment; however, it is only partially effective, and patients still present elevated HbS levels. Here, we developed a bifunctional LV expressing βAS3-globin and an artificial microRNA (amiRNA) specifically downregulating βS-globin expression with the aim of reducing HbS levels and favoring βAS3 incorporation into Hb tetramers. Efficient transduction of SCD HSPCs by the bifunctional LV led to a substantial decrease of βS-globin transcripts in HSPC-derived erythroid cells, a significant reduction of HbS+ red cells, and effective correction of the sickling phenotype, outperforming βAS gene addition and BCL11A gene silencing strategies. The bifunctional LV showed a standard integration profile, and neither HSPC viability, engraftment, and multilineage differentiation nor the erythroid transcriptome and miRNAome were affected by the treatment, confirming the safety of this therapeutic strategy. In conclusion, the combination of gene addition and gene silencing strategies can improve the efficacy of current LV-based therapeutic approaches without increasing the mutagenic vector load, thus representing a novel treatment for SCD

    Editing a γ-globin repressor binding site restores fetal hemoglobin synthesis and corrects the sickle cell disease phenotype

    No full text
    Sickle cell disease (SCD) is caused by a single amino acid change in the adult hemoglobin (Hb) β chain that causes Hb polymerization and red blood cell (RBC) sickling. The co-inheritance of mutations causing fetal γ-globin production in adult life hereditary persistence of fetal Hb (HPFH) reduces the clinical severity of SCD. HPFH mutations in the HBG γ-globin promoters disrupt binding sites for the repressors BCL11A and LRF. We used CRISPR-Cas9 to mimic HPFH mutations in the HBG promoters by generating insertions and deletions, leading to disruption of known and putative repressor binding sites. Editing of the LRF-binding site in patient-derived hematopoietic stem/progenitor cells (HSPCs) resulted in γ-globin derepression and correction of the sickling phenotype. Xenotransplantation of HSPCs treated with gRNAs targeting the LRF-binding site showed a high editing efficiency in repopulating HSPCs. This study identifies the LRF-binding site as a potent target for genome-editing treatment of SCD
    corecore