65 research outputs found

    Gradient-Informed Quality Diversity for the Illumination of Discrete Spaces

    Full text link
    Quality Diversity (QD) algorithms have been proposed to search for a large collection of both diverse and high-performing solutions instead of a single set of local optima. While early QD algorithms view the objective and descriptor functions as black-box functions, novel tools have been introduced to use gradient information to accelerate the search and improve overall performance of those algorithms over continuous input spaces. However a broad range of applications involve discrete spaces, such as drug discovery or image generation. Exploring those spaces is challenging as they are combinatorially large and gradients cannot be used in the same manner as in continuous spaces. We introduce map-elites with a Gradient-Informed Discrete Emitter (ME-GIDE), which extends QD optimisation with differentiable functions over discrete search spaces. ME-GIDE leverages the gradient information of the objective and descriptor functions with respect to its discrete inputs to propose gradient-informed updates that guide the search towards a diverse set of high quality solutions. We evaluate our method on challenging benchmarks including protein design and discrete latent space illumination and find that our method outperforms state-of-the-art QD algorithms in all benchmarks

    PASTA: Pretrained Action-State Transformer Agents

    Full text link
    Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning

    The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

    Full text link
    In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.Comment: 10+7 page

    A competitive integration model of exogenous and endogenous eye movements

    Get PDF
    We present a model of the eye movement system in which the programming of an eye movement is the result of the competitive integration of information in the superior colliculi (SC). This brain area receives input from occipital cortex, the frontal eye fields, and the dorsolateral prefrontal cortex, on the basis of which it computes the location of the next saccadic target. Two critical assumptions in the model are that cortical inputs are not only excitatory, but can also inhibit saccades to specific locations, and that the SC continue to influence the trajectory of a saccade while it is being executed. With these assumptions, we account for many neurophysiological and behavioral findings from eye movement research. Interactions within the saccade map are shown to account for effects of distractors on saccadic reaction time (SRT) and saccade trajectory, including the global effect and oculomotor capture. In addition, the model accounts for express saccades, the gap effect, saccadic reaction times for antisaccades, and recorded responses from neurons in the SC and frontal eye fields in these tasks. © The Author(s) 2010

    Développement et analyse d'une méthode sans maillage pour les systèmes hyperboliques du premier ordre

    No full text
    CHATENAY MALABRY-Ecole centrale (920192301) / SudocSudocFranceF

    Rôle des afférences musculaires du groupe I dans le contrôle de la posture et du mouvement chez l'homme

    No full text
    PARIS-BIUSJ-Thèses (751052125) / SudocPARIS-BIUSJ-Physique recherche (751052113) / SudocSudocFranceF

    CAD-Free Soft Handle Parameterization Tool for Adjoint-Based Optimization Methods

    No full text
    <p>Soft Handle CAD – Free Parameterization Tool (Basic idea) Aims to keep rich design space and enforce smoothness to the resulting shape. Handles/Parameters are selected appropriately. Shape changes are gracefully driven by the movement of the handles, whilst enforcing smoothness to the shape</p
    corecore