Search CORE

261 research outputs found

Increasing the Action Gap: New Operators for Reinforcement Learning

Author: Bellemare Marc G.
Guez Arthur
Munos Rémi
Ostrovski Georg
Thomas Philip S.
Publication venue
Publication date: 15/12/2015
Field of study

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The properties of the Malin 1 galaxy giant disk: A panchromatic view from the NGVS and GUViCS surveys

Author: Boissier S.
Boselli A.
Cote P.
Cuillandre J. -C.
de Paz A. Gil
Ferrarese L.
Gwyn S. D. J.
Koda J.
Madore B. F.
Mateos J. C. Munos
Roediger J.
Roehlly Y.
Publication venue: 'EDP Sciences'
Publication date: 01/01/2016
Field of study

Low surface brightness galaxies (LSBGs) represent a significant percentage of local galaxies but their formation and evolution remain elusive. They may hold crucial information for our understanding of many key issues (i.e., census of baryonic and dark matter, star formation in the low density regime, mass function). The most massive examples - the so called giant LSBGs - can be as massive as the Milky Way, but with this mass being distributed in a much larger disk. Malin 1 is an iconic giant LSBG, perhaps the largest disk galaxy known. We attempt to bring new insights on its structure and evolution on the basis of new images covering a wide range in wavelength. We have computed surface brightness profiles (and average surface brightnesses in 16 regions of interest), in six photometric bands (FUV, NUV, u, g, i, z). We compared these data to various models, testing a variety of assumptions concerning the formation and evolution of Malin 1. We find that the surface brightness and color profiles can be reproduced by a long and quiet star-formation history due to the low surface density; no significant event, such as a collision, is necessary. Such quiet star formation across the giant disk is obtained in a disk model calibrated for the Milky Way, but with an angular momentum approximately 20 times larger. Signs of small variations of the star-formation history are indicated by the diversity of ages found when different regions within the galaxy are intercompared.For the first time, panchromatic images of Malin 1 are used to constrain the stellar populations and the history of this iconic example among giant LSBGs. Based on our model, the extreme disk of Malin 1 is found to have a long history of relatively low star formation (about 2 Msun/yr). Our model allows us to make predictions on its stellar mass and metallicity.Comment: Accepted in Astronomy and Astrophysic

arXiv.org e-Print Archive

Docta Complutense

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Maximal Entanglement, Collective Coordinates and Tracking the King

Author: Combescure M
Ivanovic I D
Klimov A B
Klimov A B Munos C Romero J L
M Revzen
Revzen M
Revzen M
Saniga M
Schwinger J
Schwinger J
Shirakova S A
Vourdas A
Weyl H
Publication venue: 'IOP Publishing'
Publication date: 29/09/2012
Field of study

Maximal entangled states (MES) provide a basis to two d-dimensional particles Hilbert space, d=prime

\ne 2

. The MES forming this basis are product states in the collective, center of mass and relative, coordinates. These states are associated (underpinned) with lines of finite geometry whose constituent points are associated with product states carrying Mutual Unbiased Bases (MUB) labels. This representation is shown to be convenient for the study of the Mean King Problem and a variant thereof, termed Tracking the King which proves to be a novel quantum communication channel. The main topics, notions used are reviewed in an attempt to have the paper self contained.Comment: 8. arXiv admin note: substantial text overlap with arXiv:1206.3884, arXiv:1206.035

arXiv.org e-Print Archive

Crossref

Case studies and analysis of mine shafts incidents in Europe

Author: Bock Slawomir
Dziura J.
Gajda L.
Lecomte Amélie
Marshall Alec
Munos Niharra Agustin
Prusek S.
Purvis M.
Salmon Romuald
Yang W.
Publication venue: HAL CCSD
Publication date: 24/04/2012
Field of study

International audienceEntry to mine workings is normally gained by means of vertical shafts or horizontal or inclined tunnels called adits. Other mining objects such as fan drifts and wheel pits are often associated with mine shafts. Such mining objects may or may not have been filled, wholly or partially, or otherwise sealed to prevent entry when the mine was abandoned. Nowadays mine entries are usually adequately protected on abandonment to prevent accidental ingress. Many earlier mine entries remain open, however, and may pose a threat to human safety. Within the framework of MISSTER (Mine shafts: improving security and new tools for the evaluation of risks), a European RFCS project (Research Fund for Coal and Steel), a selection of representative cases of mine shafts incidents was reviewed. This work was carried out by INERIS (France), GEOCONTROL (Spain), University of Nottingham and Mine Rescue Service Ltd (United Kingdom), Central Mining Institute and KWSA (Poland). The experience accumulated through this work will allow a fuller determination of risk scenarios associated with mine shafts

HAL-INERIS

Innovative Partnerships for Drug Discovery against Neglected Diseases

Author: AL Hopkins
B Callan
BH Munos
EA Zerhouni
M Moran
M Moran
Ming-Wei Wang
Palle H. Jakobsen
S Aksoy
S Nwaka
S Nwaka
S Nwaka
S Nwaka
Solomon Nwaka
T Mboya Okeyo
Timothy G. Geary
Publication venue: Public Library of Science
Publication date: 01/09/2011
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Open access and open source in chemistry

Author: A Fenwick
B Munos
C Gaillard
D Butler
H Ledford
Matthew H Todd
R Guha
Richard M Titmuss
S Everts
S Krause
Simon Winchester
TB Kepler
WJ Geldenhuys
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Scientific data are being generated and shared at ever-increasing rates. Two new mechanisms for doing this have developed: open access publishing and open source research. We discuss both, with recent examples, highlighting the differences between the two, and the strengths of both

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Regularized fitted Q-iteration: application to planning

Author: A. Antos
B. Schölkopf
D. Ernst
D. Ormoneit
D.-X. Zhou
D.P. Bertsekas
F. Bunea
L. Györfi
N. Srebro
R. Munos
S. Mannor
X. Xu
Y. Engel
Y. Engel
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure

CiteSeerX

Crossref

SZTAKI Publication Repository

Commercializing Biomedical Research Through Securitization Techniques

Author: Andrew W Lo
AW Lo
B Huggett
B Munos
C Adams
C Bluhm
HM Markowitz
J DiMasi
J DiMasi
Jose-Maria Fernandez
M Goodman
Roger M Stein
S Papadopoulos
SM Paul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2012
Field of study

Biomedical innovation has become riskier, more expensive and more difficult to finance with traditional sources such as private and public equity. Here we propose a financial structure in which a large number of biomedical programs at various stages of development are funded by a single entity to substantially reduce the portfolio's risk. The portfolio entity can finance its activities by issuing debt, a critical advantage because a much larger pool of capital is available for investment in debt versus equity. By employing financial engineering techniques such as securitization, it can raise even greater amounts of more-patient capital. In a simulation using historical data for new molecular entities in oncology from 1990 to 2011, we find that megafunds of $5–15 billion may yield average investment returns of 8.9–11.4% for equity holders and 5–8% for 'research-backed obligation' holders, which are lower than typical venture-capital hurdle rates but attractive to pension funds, insurance companies and other large institutional investors

DSpace@MIT

Crossref

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Author: A. Antos
A. Antos
A. Nobel
András Antos
B. Yu
Csaba Szepesvári
D. Ernst
D. Haussler
D. Ormoneit
D. P. Bertsekas
D. P. Bertsekas
D. Pollard
E. Cheney
G. Gordon
J. N. Tsitsiklis
L. Devroye
L. Györfi
M. Anthony
M. Carrasco
M. Kuczma
M. Lagoudakis
P. Doukhan
P. Schweitzer
R. A. Howard
R. Bellman
R. Meir
R. Sutton
Rémi Munos
S. Bradtke
S. Meyn
S. Murphy
T. G. Dietterich
Y. Baraud
Y. Davidov
Publication venue
Publication date: 01/01/2008
Field of study

We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory

CiteSeerX

HAL - Lille 3

Crossref

SZTAKI Publication Repository

INRIA a CCSD electronic archive server