Search CORE

7,733 research outputs found

Parsimonious reasoning in reinforcement learning for better credit assignment

Author: Ma Michel
Publication venue
Publication date: 01/08/2021
Field of study

Le contenu de cette thèse explore la question de l’attribution de crédits à long terme dans l’apprentissage par renforcement du point de vue d’un biais inductif de parcimonie. Dans ce contexte, un agent parcimonieux cherche à comprendre son environnement en utilisant le moins de variables possible. Autrement dit, si l’agent est crédité ou blâmé pour un certain comportement, la parcimonie l’oblige à attribuer ce crédit (ou blâme) à seulement quelques variables latentes sélectionnées. Avant de proposer de nouvelles méthodes d’attribution parci- monieuse de crédits, nous présentons les travaux antérieurs relatifs à l’attribution de crédits à long terme en relation avec l’idée de sparsité. Ensuite, nous développons deux nouvelles idées pour l’attribution de crédits dans l’apprentissage par renforcement qui sont motivées par un raisonnement parcimonieux : une dans le cadre sans modèle et une pour l’apprentissage basé sur un modèle. Pour ce faire, nous nous appuyons sur divers concepts liés à la parcimonie issus de la causalité, de l’apprentissage supervisé et de la simulation, et nous les appliquons dans un cadre pour la prise de décision séquentielle. La première, appelée évaluation contrefactuelle de la politique, prend en compte les dévi- ations mineures de ce qui aurait pu être compte tenu de ce qui a été. En restreignant l’espace dans lequel l’agent peut raisonner sur les alternatives, l’évaluation contrefactuelle de la politique présente des propriétés de variance favorables à l’évaluation des politiques. L’évaluation contrefactuelle de la politique offre également une nouvelle perspective sur la rétrospection, généralisant les travaux antérieurs sur l’attribution de crédits a posteriori. La deuxième contribution de cette thèse est un algorithme augmenté d’attention latente pour l’apprentissage par renforcement basé sur un modèle : Latent Sparse Attentive Value Gra- dients (LSAVG). En intégrant pleinement l’attention dans la structure d’optimisation de la politique, nous montrons que LSAVG est capable de résoudre des tâches de mémoire active que son homologue sans modèle a été conçu pour traiter, sans recourir à des heuristiques ou à un biais de l’estimateur original.The content of this thesis explores the question of long-term credit assignment in reinforce- ment learning from the perspective of a parsimony inductive bias. In this context, a parsi- monious agent looks to understand its environment through the least amount of variables possible. Alternatively, given some credit or blame for some behavior, parsimony forces the agent to assign this credit (or blame) to only a select few latent variables. Before propos- ing novel methods for parsimonious credit assignment, previous work relating to long-term credit assignment is introduced in relation to the idea of sparsity. Then, we develop two new ideas for credit assignment in reinforcement learning that are motivated by parsimo- nious reasoning: one in the model-free setting, and one for model-based learning. To do so, we build upon various parsimony-related concepts from causality, supervised learning, and simulation, and apply them to the Markov Decision Process framework. The first of which, called counterfactual policy evaluation, considers minor deviations of what could have been given what has been. By restricting the space in which the agent can reason about alternatives, counterfactual policy evaluation is shown to have favorable variance properties for policy evaluation. Counterfactual policy evaluation also offers a new perspective to hindsight, generalizing previous work in hindsight credit assignment. The second contribution of this thesis is a latent attention augmented algorithm for model-based reinforcement learning: Latent Sparse Attentive Value Gradients (LSAVG). By fully inte- grating attention into the structure for policy optimization, we show that LSAVG is able to solve active memory tasks that its model-free counterpart was designed to tackle, without resorting to heuristics or biasing the original estimator

Dépôt Institutionnel Numérique

The accretion disk in the post period-minimum cataclysmic variable SDSS J080434.20+510349.2

Author: Aviles A.
Garcia-Diaz Ma. T.
Gonzalez-Buitrago D.
Michel R.
Tovmassian G.
Zharikov S.
Publication venue: 'EDP Sciences'
Publication date: 06/11/2012
Field of study

This study of SDSS0804 is primarily concerned with the double-hump shape in the light curve and its connection with the accretion disk in this bounce-back system. Time-resolved photometric and spectroscopic observations were obtained to analyze the behavior of the system between superoutbursts. A geometric model of a binary system containing a disk with two outer annuli spiral density waves was applied to explain the light curve and the Doppler tomography. Observations were carried out during 2008-2009, after the object's magnitude decreased to V~17.7(0.1) from the March 2006 eruption. The light curve clearly shows a sinusoid-like variability with a 0.07 mag amplitude and a 42.48 min periodicity, which is half of the orbital period of the system. In Sept. 2010, the system underwent yet another superoutburst and returned to its quiescent level by the beginning of 2012. This light curve once again showed a double-humps, but with a significantly smaller ~0.01mag amplitude. Other types of variability like a "mini-outburst" or SDSS1238-like features were not detected. Doppler tomograms, obtained from spectroscopic data during the same period of time, show a large accretion disk with uneven brightness, implying the presence of spiral waves. We constructed a geometric model of a bounce-back system containing two spiral density waves in the outer annuli of the disk to reproduce the observed light curves. The Doppler tomograms and the double-hump-shape light curves in quiescence can be explained by a model system containing a massive >0.7Msun white dwarf with a surface temperature of ~12000K, a late-type brown dwarf, and an accretion disk with two outer annuli spirals. According to this model, the accretion disk should be large, extending to the 2:1 resonance radius, and cool (~2500K). The inner parts of the disk should be optically thin in the continuum or totally void.Comment: 12 pages, 15 figures, accepted for publication in A&

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

SPECTRAL GAP FOR SPHERICALLY SYMMETRIC LOG-CONCAVE PROBABILITY MEASURES, AND BEYOND

Author: Bonnefont Michel
Joulin Aldéric
Ma Yutao
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

International audienceLet

\mu

be a probability measure on \rr^n (

n \geq 2

) with Lebesgue density proportional to

e^{-V (\Vert x\Vert )}

, where V : \rr_+ \to \rr is a smooth convex potential. We show that the associated spectral gap in

L^2 (\mu)

lies between (n-1) / \int_{\rr^n} \Vert x\Vert ^2 \mu(dx) and n / \int_{\rr^n} \Vert x\Vert ^2 \mu(dx), improving a well-known two-sided estimate due to Bobkov. Our Markovian approach is remarkably simple and is sufficiently robust to be extended beyond the log-concave case, at the price of potentially modifying the underlying dynamics in the energy, leading to weighted Poincaré inequalities. All our results are illustrated by some classical and less classical examples

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Oskar Bordeaux

A note on spectral gap and weighted Poincar\'e inequalities for some one-dimensional diffusions

Author: Bonnefont Michel
Joulin Aldéric
Ma Yutao
Publication venue: 'EDP Sciences'
Publication date: 03/06/2016
Field of study

International audienceWe present some classical and weighted Poincar\'e inequalities for some one-dimensional probability measures. This work is the one-dimensional counterpart of a recent study achieved by the authors for a class of spherically symmetric probability measures in dimension larger than 2. Our strategy is based on two main ingredients: on the one hand, the optimal constant in the desired weighted Poincar\'e inequality has to be rewritten as the spectral gap of a convenient Markovian diffusion operator, and on the other hand we use a recent result given by the two first authors, which allows to estimate precisely this spectral gap. In particular we are able to capture its exact value for some examples

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Oskar Bordeaux

Balanced Tripartite Entanglement, the Alternating Group A4 and the Lie Algebra $sl(3,C) \oplus u(1)$

Author: Acin
Bosma
Brif
Cahn
Carruthers
Choudhary
Coffman
Coffman
Duff
Dömötor
Dür
Frappat
Frappat
Hall
Helgason
Hornos
Kibler
Kibler
Korbicz
Lévay
Lévay
Ma
Ma
Metod Saniga
Michel Planat
Planat
Planat
Planat
Planat
Planat
Péter Lévay
Vourdas
Yang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We discuss three important classes of three-qubit entangled states and their encoding into quantum gates, finite groups and Lie algebras. States of the GHZ and W-type correspond to pure tripartite and bipartite entanglement, respectively. We introduce another generic class B of three-qubit states, that have balanced entanglement over two and three parties. We show how to realize the largest cristallographic group

W(E_8)

in terms of three-qubit gates (with real entries) encoding states of type GHZ or W [M. Planat, {\it Clifford group dipoles and the enactment of Weyl/Coxeter group

W(E_8)

by entangling gates}, Preprint 0904.3691 (quant-ph)]. Then, we describe a peculiar "condensation" of

W(E_8)

into the four-letter alternating group

A_4

, obtained from a chain of maximal subgroups. Group

A_4

is realized from two B-type generators and found to correspond to the Lie algebra

sl(3,\mathbb{C})\oplus u(1)

. Possible applications of our findings to particle physics and the structure of genetic code are also mentioned.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

HAL - Université de Franche-Comté

Crossref