4,722 research outputs found

    A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms

    Full text link
    Consistently checking the statistical significance of experimental results is the first mandatory step towards reproducible science. This paper presents a hitchhiker's guide to rigorous comparisons of reinforcement learning algorithms. After introducing the concepts of statistical testing, we review the relevant statistical tests and compare them empirically in terms of false positive rate and statistical power as a function of the sample size (number of seeds) and effect size. We further investigate the robustness of these tests to violations of the most common hypotheses (normal distributions, same distributions, equal variances). Beside simulations, we compare empirical distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep Deterministic Policy Gradient on Half-Cheetah. We conclude by providing guidelines and code to perform rigorous comparisons of RL algorithm performances.Comment: 8 pages + supplementary materia

    CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning

    Get PDF
    In open-ended environments, autonomous learning agents must set their own goals and build their own curriculum through an intrinsically motivated exploration. They may consider a large diversity of goals, aiming to discover what is controllable in their environments, and what is not. Because some goals might prove easy and some impossible, agents must actively select which goal to practice at any moment, to maximize their overall mastery on the set of learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a modular Universal Value Function Approximator with hindsight learning to achieve a diversity of goals of different kinds within a unique policy and 2) an automated curriculum learning mechanism that biases the attention of the agent towards goals maximizing the absolute learning progress. Agents focus sequentially on goals of increasing complexity, and focus back on goals that are being forgotten. Experiments conducted in a new modular-goal robotic environment show the resulting developmental self-organization of a learning curriculum, and demonstrate properties of robustness to distracting goals, forgetting and changes in body properties.Comment: Accepted at ICML 201

    A well-balanced finite volume scheme for 1D hemodynamic simulations

    Get PDF
    We are interested in simulating blood flow in arteries with variable elasticity with a one dimensional model. We present a well-balanced finite volume scheme based on the recent developments in shallow water equations context. We thus get a mass conservative scheme which also preserves equilibria of Q=0. This numerical method is tested on analytical tests.Comment: 6 pages. R\'esum\'e en fran\c{c}ais : Nous nous int\'eressons \`a la simulation d'\'ecoulements sanguins dans des art\`eres dont les parois sont \`a \'elasticit\'e variable. Ceci est mod\'elis\'e \`a l'aide d'un mod\`ele unidimensionnel. Nous pr\'esentons un sch\'ema "volume fini \'equilibr\'e" bas\'e sur les d\'eveloppements r\'ecents effectu\'es pour la r\'esolution du syst\`eme de Saint-Venant. Ainsi, nous obtenons un sch\'ema qui pr\'eserve le volume de fluide ainsi que les \'equilibres au repos: Q=0. Le sch\'ema introduit est test\'e sur des solutions analytique

    DĂ©sordres parlementaires

    Get PDF
    La sĂ©paration entre reprĂ©sentants et reprĂ©sentĂ©s s’incarne dans la topographie des lieux de reprĂ©sentation. C’est notamment aux portes du Parlement qu’elle est signifiĂ©e. Ainsi, pour Moisie Ostrogorski (1903 : 573), citĂ© par Bernard Manin (1996 : 263), la facultĂ© de l’opinion Ă  inspirer et contrĂŽler les dirigeants entre deux Ă©lections se traduit par la libertĂ© et l’imprĂ©visibilitĂ© de sa manifestation « jusqu’à la porte du Parlement ». Le trouble Ă  l’ordre public qui peut en rĂ©sulter contraste avec l’aspect codifiĂ© des Ă©changes ordinaires Ă  l’intĂ©rieur des chambres. Le rapport contrastĂ© Ă  l’ordre et Ă  la violence de chaque cĂŽtĂ© des portes du Parlement constitue ainsi un aspect essentiel de l’institutionnalisation des assemblĂ©es et au-delĂ  de l’autonomie des dirigeants au sein du gouvernement reprĂ©sentatif. À titre d’exemple, on note que si le droit de pĂ©tition est reconnu de longue date au Parlement, l’article 147-2 du rĂšglement de l’AssemblĂ©e stipule que « une pĂ©tition apportĂ©e ou transmise par un rassemblement formĂ© sur la voie publique ne peut ĂȘtre reçue par le PrĂ©sident, ni dĂ©posĂ©e sur le bureau ». L’autonomie des arĂšnes parlementaires n’est toutefois jamais acquise. Elle est, comme le laisse entendre le mot « institution », en train de se faire, rĂ©sultat des tensions sur lesquelles elle parvient plus ou moins Ă  Ă©merger. Plus prĂ©cisĂ©ment, la capacitĂ© des parlements Ă  s’autonomiser est entravĂ©e par leur insertion dans un ordre politique plus large et par la sĂ©lection de leurs membres aux moyens d’élections populaires. Ainsi, de mĂȘme que le dĂ©bat parlementaire voit se cĂŽtoyer une grammaire de la discussion « autorĂ©fĂ©rentielle » et une grammaire critique organisant « un dĂ©senclavement structurel de la sĂ©ance » (Heurtin 1999 : 267-268), l’espace public parlementaire semble pris dans une tension permanente, entre l’affirmation d’un ordre spĂ©cifique et son dĂ©bordement. [Premier paragraphe

    Challenges in experimental data integration within genome-scale metabolic models

    Get PDF
    A report of the meeting "Challenges in experimental data integration within genome-scale metabolic models", Institut Henri Poincar\'e, Paris, October 10-11 2009, organized by the CNRS-MPG joint program in Systems Biology.Comment: 5 page

    How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments

    Get PDF
    Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how the number of random seeds relates to the probabilities of statistical errors. For both the t-test and the bootstrap confidence interval test, we recall theoretical guidelines to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. Finally, we discuss the influence of deviations from the assumptions usually made by statistical tests. We show that they can lead to inaccurate evaluations of statistical errors and provide guidelines to counter these negative effects. We make our code available to perform the tests

    Modifications of the rainforest frugivore community are associated with reduced seed removal at the community level

    Get PDF
    International audienceTropical rainforests worldwide are under increasing pressure from human activities, which are altering key ecosystem processes such as plant-animal interactions. However, while the direct impact of anthropogenic disturbance on animal communities has been well studied, the consequences of such defaunation for mutualistic interactions such as seed dispersal remains chiefly understood at the plant species level. We asked whether communities of endozoochorous tree species had altered seed removal in forests affected by hunting and logging and if this could be related to modifications of the frugivore community. At two contrasting forest sites in French Guiana, Nouragues (protected) and Montagne de Kaw (hunted and partly logged), we focused on four families of animal-dispersed trees (Sapotaceae, Myristicaceae, Burseraceae and Fabaceae) which represent 88 % of all endozoochorous trees which were fruiting at the time and location of the study. We assessed the abundance of the seed dispersers and predators of these four focal families by conducting diurnal distance sampling along line transects. Densities of several key seed dispersers such as large-bodied primates were greatly reduced at Montagne de Kaw, where the specialist frugivore Ateles paniscus is probably extinct. In parallel, we estimated seed removal rates from fruit and seed counts conducted in one-square-meter quadrats placed on the ground beneath fruiting trees. Seed removal rates dropped from 77 % at Nouragues to 47 % at Montagne de Kaw, confirming that the loss of frugivores associated with human disturbance impacts seed removal at the community level. In contrast to Sapotaceae, whose seeds are dispersed by mammals only, weaker declines in seed removal for Burseraceae and Myristicaceae suggest that some compensation may occur for these bird- and mammal-dispersed families, possibly because of the high abundance of toucans at the disturbed site. The defaunation process currently occurring across many tropical forests could dramatically reduce the diversity of entire communities of animal-dispersed trees through seed removal limitation

    A 2D/3D Discrete Duality Finite Volume Scheme. Application to ECG simulation

    Get PDF
    International audienceThis paper presents a 2D/3D discrete duality finite volume method for solving heterogeneous and anisotropic elliptic equations on very general unstructured meshes. The scheme is based on the definition of discrete divergence and gradient operators that fulfill a duality property mimicking the Green formula. As a consequence, the discrete problem is proved to be well-posed, symmetric and positive-definite. Standard numerical tests are performed in 2D and 3D and the results are discussed and compared with P1 finite elements ones. At last, the method is used for the resolution of a problem arising in biomathematics: the electrocardiogram simulation on a 2D mesh obtained from segmented medical images

    Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

    Full text link
    Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelicautotelic agentsagents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmentaldevelopmental reinforcementreinforcement learninglearning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the intrinsicallyintrinsically motivatedmotivated acquisitionacquisition ofof openopen-endedended repertoiresrepertoires ofof skillsskills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition

    Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

    Get PDF
    Intrinsically motivated goal exploration algorithms enable machines to discover repertoires of policies that produce a diversity of effects in complex environments. These exploration algorithms have been shown to allow real world robots to acquire skills such as tool use in high-dimensional continuous state and action spaces. However, they have so far assumed that self-generated goals are sampled in a specifically engineered feature space, limiting their autonomy. In this work, we propose to use deep representation learning algorithms to learn an adequate goal space. This is a developmental 2-stage approach: first, in a perceptual learning stage, deep learning algorithms use passive raw sensor observations of world changes to learn a corresponding latent space; then goal exploration happens in a second stage by sampling goals in this latent space. We present experiments where a simulated robot arm interacts with an object, and we show that exploration algorithms using such learned representations can match the performance obtained using engineered representations
    • 

    corecore