8,026 research outputs found
Parsimonious reasoning in reinforcement learning for better credit assignment
Le contenu de cette thèse explore la question de l’attribution de crédits à long terme dans l’apprentissage par renforcement du point de vue d’un biais inductif de parcimonie. Dans ce contexte, un agent parcimonieux cherche à comprendre son environnement en utilisant le moins de variables possible. Autrement dit, si l’agent est crédité ou blâmé pour un certain comportement, la parcimonie l’oblige à attribuer ce crédit (ou blâme) à seulement quelques variables latentes sélectionnées. Avant de proposer de nouvelles méthodes d’attribution parci- monieuse de crédits, nous présentons les travaux antérieurs relatifs à l’attribution de crédits à long terme en relation avec l’idée de sparsité. Ensuite, nous développons deux nouvelles idées pour l’attribution de crédits dans l’apprentissage par renforcement qui sont motivées par un raisonnement parcimonieux : une dans le cadre sans modèle et une pour l’apprentissage basé sur un modèle. Pour ce faire, nous nous appuyons sur divers concepts liés à la parcimonie issus de la causalité, de l’apprentissage supervisé et de la simulation, et nous les appliquons dans un cadre pour la prise de décision séquentielle.
La première, appelée évaluation contrefactuelle de la politique, prend en compte les dévi- ations mineures de ce qui aurait pu être compte tenu de ce qui a été. En restreignant l’espace dans lequel l’agent peut raisonner sur les alternatives, l’évaluation contrefactuelle de la politique présente des propriétés de variance favorables à l’évaluation des politiques. L’évaluation contrefactuelle de la politique offre également une nouvelle perspective sur la rétrospection, généralisant les travaux antérieurs sur l’attribution de crédits a posteriori. La deuxième contribution de cette thèse est un algorithme augmenté d’attention latente pour l’apprentissage par renforcement basé sur un modèle : Latent Sparse Attentive Value Gra- dients (LSAVG). En intégrant pleinement l’attention dans la structure d’optimisation de la politique, nous montrons que LSAVG est capable de résoudre des tâches de mémoire active que son homologue sans modèle a été conçu pour traiter, sans recourir à des heuristiques ou à un biais de l’estimateur original.The content of this thesis explores the question of long-term credit assignment in reinforce- ment learning from the perspective of a parsimony inductive bias. In this context, a parsi- monious agent looks to understand its environment through the least amount of variables possible. Alternatively, given some credit or blame for some behavior, parsimony forces the agent to assign this credit (or blame) to only a select few latent variables. Before propos- ing novel methods for parsimonious credit assignment, previous work relating to long-term credit assignment is introduced in relation to the idea of sparsity. Then, we develop two new ideas for credit assignment in reinforcement learning that are motivated by parsimo- nious reasoning: one in the model-free setting, and one for model-based learning. To do so, we build upon various parsimony-related concepts from causality, supervised learning, and simulation, and apply them to the Markov Decision Process framework.
The first of which, called counterfactual policy evaluation, considers minor deviations of what could have been given what has been. By restricting the space in which the agent can reason about alternatives, counterfactual policy evaluation is shown to have favorable variance properties for policy evaluation. Counterfactual policy evaluation also offers a new perspective to hindsight, generalizing previous work in hindsight credit assignment. The second contribution of this thesis is a latent attention augmented algorithm for model-based reinforcement learning: Latent Sparse Attentive Value Gradients (LSAVG). By fully inte- grating attention into the structure for policy optimization, we show that LSAVG is able to solve active memory tasks that its model-free counterpart was designed to tackle, without resorting to heuristics or biasing the original estimator
The accretion disk in the post period-minimum cataclysmic variable SDSS J080434.20+510349.2
This study of SDSS0804 is primarily concerned with the double-hump shape in
the light curve and its connection with the accretion disk in this bounce-back
system. Time-resolved photometric and spectroscopic observations were obtained
to analyze the behavior of the system between superoutbursts. A geometric model
of a binary system containing a disk with two outer annuli spiral density waves
was applied to explain the light curve and the Doppler tomography. Observations
were carried out during 2008-2009, after the object's magnitude decreased to
V~17.7(0.1) from the March 2006 eruption. The light curve clearly shows a
sinusoid-like variability with a 0.07 mag amplitude and a 42.48 min
periodicity, which is half of the orbital period of the system. In Sept. 2010,
the system underwent yet another superoutburst and returned to its quiescent
level by the beginning of 2012. This light curve once again showed a
double-humps, but with a significantly smaller ~0.01mag amplitude. Other types
of variability like a "mini-outburst" or SDSS1238-like features were not
detected. Doppler tomograms, obtained from spectroscopic data during the same
period of time, show a large accretion disk with uneven brightness, implying
the presence of spiral waves. We constructed a geometric model of a bounce-back
system containing two spiral density waves in the outer annuli of the disk to
reproduce the observed light curves. The Doppler tomograms and the
double-hump-shape light curves in quiescence can be explained by a model system
containing a massive >0.7Msun white dwarf with a surface temperature of
~12000K, a late-type brown dwarf, and an accretion disk with two outer annuli
spirals. According to this model, the accretion disk should be large, extending
to the 2:1 resonance radius, and cool (~2500K). The inner parts of the disk
should be optically thin in the continuum or totally void.Comment: 12 pages, 15 figures, accepted for publication in A&
SPECTRAL GAP FOR SPHERICALLY SYMMETRIC LOG-CONCAVE PROBABILITY MEASURES, AND BEYOND
International audienceLet be a probability measure on \rr^n () with Lebesgue density proportional to , where V : \rr_+ \to \rr is a smooth convex potential. We show that the associated spectral gap in lies between (n-1) / \int_{\rr^n} \Vert x\Vert ^2 \mu(dx) and n / \int_{\rr^n} \Vert x\Vert ^2 \mu(dx), improving a well-known two-sided estimate due to Bobkov. Our Markovian approach is remarkably simple and is sufficiently robust to be extended beyond the log-concave case, at the price of potentially modifying the underlying dynamics in the energy, leading to weighted Poincaré inequalities. All our results are illustrated by some classical and less classical examples
A note on spectral gap and weighted Poincar\'e inequalities for some one-dimensional diffusions
International audienceWe present some classical and weighted Poincar\'e inequalities for some one-dimensional probability measures. This work is the one-dimensional counterpart of a recent study achieved by the authors for a class of spherically symmetric probability measures in dimension larger than 2. Our strategy is based on two main ingredients: on the one hand, the optimal constant in the desired weighted Poincar\'e inequality has to be rewritten as the spectral gap of a convenient Markovian diffusion operator, and on the other hand we use a recent result given by the two first authors, which allows to estimate precisely this spectral gap. In particular we are able to capture its exact value for some examples
Balanced Tripartite Entanglement, the Alternating Group A4 and the Lie Algebra
We discuss three important classes of three-qubit entangled states and their
encoding into quantum gates, finite groups and Lie algebras. States of the GHZ
and W-type correspond to pure tripartite and bipartite entanglement,
respectively. We introduce another generic class B of three-qubit states, that
have balanced entanglement over two and three parties. We show how to realize
the largest cristallographic group in terms of three-qubit gates (with
real entries) encoding states of type GHZ or W [M. Planat, {\it Clifford group
dipoles and the enactment of Weyl/Coxeter group by entangling gates},
Preprint 0904.3691 (quant-ph)]. Then, we describe a peculiar "condensation" of
into the four-letter alternating group , obtained from a chain of
maximal subgroups. Group is realized from two B-type generators and found
to correspond to the Lie algebra . Possible
applications of our findings to particle physics and the structure of genetic
code are also mentioned.Comment: 14 page
- …