4,722 research outputs found
A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms
Consistently checking the statistical significance of experimental results is
the first mandatory step towards reproducible science. This paper presents a
hitchhiker's guide to rigorous comparisons of reinforcement learning
algorithms. After introducing the concepts of statistical testing, we review
the relevant statistical tests and compare them empirically in terms of false
positive rate and statistical power as a function of the sample size (number of
seeds) and effect size. We further investigate the robustness of these tests to
violations of the most common hypotheses (normal distributions, same
distributions, equal variances). Beside simulations, we compare empirical
distributions obtained by running Soft-Actor Critic and Twin-Delayed Deep
Deterministic Policy Gradient on Half-Cheetah. We conclude by providing
guidelines and code to perform rigorous comparisons of RL algorithm
performances.Comment: 8 pages + supplementary materia
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
In open-ended environments, autonomous learning agents must set their own
goals and build their own curriculum through an intrinsically motivated
exploration. They may consider a large diversity of goals, aiming to discover
what is controllable in their environments, and what is not. Because some goals
might prove easy and some impossible, agents must actively select which goal to
practice at any moment, to maximize their overall mastery on the set of
learnable goals. This paper proposes CURIOUS, an algorithm that leverages 1) a
modular Universal Value Function Approximator with hindsight learning to
achieve a diversity of goals of different kinds within a unique policy and 2)
an automated curriculum learning mechanism that biases the attention of the
agent towards goals maximizing the absolute learning progress. Agents focus
sequentially on goals of increasing complexity, and focus back on goals that
are being forgotten. Experiments conducted in a new modular-goal robotic
environment show the resulting developmental self-organization of a learning
curriculum, and demonstrate properties of robustness to distracting goals,
forgetting and changes in body properties.Comment: Accepted at ICML 201
A well-balanced finite volume scheme for 1D hemodynamic simulations
We are interested in simulating blood flow in arteries with variable
elasticity with a one dimensional model. We present a well-balanced finite
volume scheme based on the recent developments in shallow water equations
context. We thus get a mass conservative scheme which also preserves equilibria
of Q=0. This numerical method is tested on analytical tests.Comment: 6 pages. R\'esum\'e en fran\c{c}ais : Nous nous int\'eressons \`a la
simulation d'\'ecoulements sanguins dans des art\`eres dont les parois sont
\`a \'elasticit\'e variable. Ceci est mod\'elis\'e \`a l'aide d'un mod\`ele
unidimensionnel. Nous pr\'esentons un sch\'ema "volume fini \'equilibr\'e"
bas\'e sur les d\'eveloppements r\'ecents effectu\'es pour la r\'esolution du
syst\`eme de Saint-Venant. Ainsi, nous obtenons un sch\'ema qui pr\'eserve le
volume de fluide ainsi que les \'equilibres au repos: Q=0. Le sch\'ema
introduit est test\'e sur des solutions analytique
DĂ©sordres parlementaires
La sĂ©paration entre reprĂ©sentants et reprĂ©sentĂ©s sâincarne dans la topographie des lieux de reprĂ©sentation. Câest notamment aux portes du Parlement quâelle est signifiĂ©e. Ainsi, pour Moisie Ostrogorski (1903 : 573), citĂ© par Bernard Manin (1996 : 263), la facultĂ© de lâopinion Ă inspirer et contrĂŽler les dirigeants entre deux Ă©lections se traduit par la libertĂ© et lâimprĂ©visibilitĂ© de sa manifestation « jusquâĂ la porte du Parlement ». Le trouble Ă lâordre public qui peut en rĂ©sulter contraste avec lâaspect codifiĂ© des Ă©changes ordinaires Ă lâintĂ©rieur des chambres. Le rapport contrastĂ© Ă lâordre et Ă la violence de chaque cĂŽtĂ© des portes du Parlement constitue ainsi un aspect essentiel de lâinstitutionnalisation des assemblĂ©es et au-delĂ de lâautonomie des dirigeants au sein du gouvernement reprĂ©sentatif. Ă titre dâexemple, on note que si le droit de pĂ©tition est reconnu de longue date au Parlement, lâarticle 147-2 du rĂšglement de lâAssemblĂ©e stipule que « une pĂ©tition apportĂ©e ou transmise par un rassemblement formĂ© sur la voie publique ne peut ĂȘtre reçue par le PrĂ©sident, ni dĂ©posĂ©e sur le bureau ». Lâautonomie des arĂšnes parlementaires nâest toutefois jamais acquise. Elle est, comme le laisse entendre le mot « institution », en train de se faire, rĂ©sultat des tensions sur lesquelles elle parvient plus ou moins Ă Ă©merger. Plus prĂ©cisĂ©ment, la capacitĂ© des parlements Ă sâautonomiser est entravĂ©e par leur insertion dans un ordre politique plus large et par la sĂ©lection de leurs membres aux moyens dâĂ©lections populaires. Ainsi, de mĂȘme que le dĂ©bat parlementaire voit se cĂŽtoyer une grammaire de la discussion « autorĂ©fĂ©rentielle » et une grammaire critique organisant « un dĂ©senclavement structurel de la sĂ©ance » (Heurtin 1999 : 267-268), lâespace public parlementaire semble pris dans une tension permanente, entre lâaffirmation dâun ordre spĂ©cifique et son dĂ©bordement. [Premier paragraphe
Challenges in experimental data integration within genome-scale metabolic models
A report of the meeting "Challenges in experimental data integration within
genome-scale metabolic models", Institut Henri Poincar\'e, Paris, October 10-11
2009, organized by the CNRS-MPG joint program in Systems Biology.Comment: 5 page
How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments
Consistently checking the statistical significance of experimental results is
one of the mandatory methodological steps to address the so-called
"reproducibility crisis" in deep reinforcement learning. In this tutorial
paper, we explain how the number of random seeds relates to the probabilities
of statistical errors. For both the t-test and the bootstrap confidence
interval test, we recall theoretical guidelines to determine the number of
random seeds one should use to provide a statistically significant comparison
of the performance of two algorithms. Finally, we discuss the influence of
deviations from the assumptions usually made by statistical tests. We show that
they can lead to inaccurate evaluations of statistical errors and provide
guidelines to counter these negative effects. We make our code available to
perform the tests
Modifications of the rainforest frugivore community are associated with reduced seed removal at the community level
International audienceTropical rainforests worldwide are under increasing pressure from human activities, which are altering key ecosystem processes such as plant-animal interactions. However, while the direct impact of anthropogenic disturbance on animal communities has been well studied, the consequences of such defaunation for mutualistic interactions such as seed dispersal remains chiefly understood at the plant species level. We asked whether communities of endozoochorous tree species had altered seed removal in forests affected by hunting and logging and if this could be related to modifications of the frugivore community. At two contrasting forest sites in French Guiana, Nouragues (protected) and Montagne de Kaw (hunted and partly logged), we focused on four families of animal-dispersed trees (Sapotaceae, Myristicaceae, Burseraceae and Fabaceae) which represent 88 % of all endozoochorous trees which were fruiting at the time and location of the study. We assessed the abundance of the seed dispersers and predators of these four focal families by conducting diurnal distance sampling along line transects. Densities of several key seed dispersers such as large-bodied primates were greatly reduced at Montagne de Kaw, where the specialist frugivore Ateles paniscus is probably extinct. In parallel, we estimated seed removal rates from fruit and seed counts conducted in one-square-meter quadrats placed on the ground beneath fruiting trees. Seed removal rates dropped from 77 % at Nouragues to 47 % at Montagne de Kaw, confirming that the loss of frugivores associated with human disturbance impacts seed removal at the community level. In contrast to Sapotaceae, whose seeds are dispersed by mammals only, weaker declines in seed removal for Burseraceae and Myristicaceae suggest that some compensation may occur for these bird- and mammal-dispersed families, possibly because of the high abundance of toucans at the disturbed site. The defaunation process currently occurring across many tropical forests could dramatically reduce the diversity of entire communities of animal-dispersed trees through seed removal limitation
A 2D/3D Discrete Duality Finite Volume Scheme. Application to ECG simulation
International audienceThis paper presents a 2D/3D discrete duality finite volume method for solving heterogeneous and anisotropic elliptic equations on very general unstructured meshes. The scheme is based on the definition of discrete divergence and gradient operators that fulfill a duality property mimicking the Green formula. As a consequence, the discrete problem is proved to be well-posed, symmetric and positive-definite. Standard numerical tests are performed in 2D and 3D and the results are discussed and compared with P1 finite elements ones. At last, the method is used for the resolution of a problem arising in biomathematics: the electrocardiogram simulation on a 2D mesh obtained from segmented medical images
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey
Building autonomous machines that can explore open-ended environments,
discover possible interactions and build repertoires of skills is a general
objective of artificial intelligence. Developmental approaches argue that this
can only be achieved by : intrinsically motivated learning
agents that can learn to represent, generate, select and solve their own
problems. In recent years, the convergence of developmental approaches with
deep reinforcement learning (RL) methods has been leading to the emergence of a
new field: . Developmental RL is
concerned with the use of deep RL algorithms to tackle a developmental problem
-- the -
. The self-generation of goals requires the learning
of compact goal encodings as well as their associated goal-achievement
functions. This raises new challenges compared to standard RL algorithms
originally designed to tackle pre-defined sets of goals using external reward
signals. The present paper introduces developmental RL and proposes a
computational framework based on goal-conditioned RL to tackle the
intrinsically motivated skills acquisition problem. It proceeds to present a
typology of the various goal representations used in the literature, before
reviewing existing methods to learn to represent and prioritize goals in
autonomous systems. We finally close the paper by discussing some open
challenges in the quest of intrinsically motivated skills acquisition
Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration
Intrinsically motivated goal exploration algorithms enable machines to
discover repertoires of policies that produce a diversity of effects in complex
environments. These exploration algorithms have been shown to allow real world
robots to acquire skills such as tool use in high-dimensional continuous state
and action spaces. However, they have so far assumed that self-generated goals
are sampled in a specifically engineered feature space, limiting their
autonomy. In this work, we propose to use deep representation learning
algorithms to learn an adequate goal space. This is a developmental 2-stage
approach: first, in a perceptual learning stage, deep learning algorithms use
passive raw sensor observations of world changes to learn a corresponding
latent space; then goal exploration happens in a second stage by sampling goals
in this latent space. We present experiments where a simulated robot arm
interacts with an object, and we show that exploration algorithms using such
learned representations can match the performance obtained using engineered
representations
- âŠ