261 research outputs found

    Increasing the Action Gap: New Operators for Reinforcement Learning

    Full text link
    This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators

    The properties of the Malin 1 galaxy giant disk: A panchromatic view from the NGVS and GUViCS surveys

    Get PDF
    Low surface brightness galaxies (LSBGs) represent a significant percentage of local galaxies but their formation and evolution remain elusive. They may hold crucial information for our understanding of many key issues (i.e., census of baryonic and dark matter, star formation in the low density regime, mass function). The most massive examples - the so called giant LSBGs - can be as massive as the Milky Way, but with this mass being distributed in a much larger disk. Malin 1 is an iconic giant LSBG, perhaps the largest disk galaxy known. We attempt to bring new insights on its structure and evolution on the basis of new images covering a wide range in wavelength. We have computed surface brightness profiles (and average surface brightnesses in 16 regions of interest), in six photometric bands (FUV, NUV, u, g, i, z). We compared these data to various models, testing a variety of assumptions concerning the formation and evolution of Malin 1. We find that the surface brightness and color profiles can be reproduced by a long and quiet star-formation history due to the low surface density; no significant event, such as a collision, is necessary. Such quiet star formation across the giant disk is obtained in a disk model calibrated for the Milky Way, but with an angular momentum approximately 20 times larger. Signs of small variations of the star-formation history are indicated by the diversity of ages found when different regions within the galaxy are intercompared.For the first time, panchromatic images of Malin 1 are used to constrain the stellar populations and the history of this iconic example among giant LSBGs. Based on our model, the extreme disk of Malin 1 is found to have a long history of relatively low star formation (about 2 Msun/yr). Our model allows us to make predictions on its stellar mass and metallicity.Comment: Accepted in Astronomy and Astrophysic

    Maximal Entanglement, Collective Coordinates and Tracking the King

    Full text link
    Maximal entangled states (MES) provide a basis to two d-dimensional particles Hilbert space, d=prime ≠2\ne 2. The MES forming this basis are product states in the collective, center of mass and relative, coordinates. These states are associated (underpinned) with lines of finite geometry whose constituent points are associated with product states carrying Mutual Unbiased Bases (MUB) labels. This representation is shown to be convenient for the study of the Mean King Problem and a variant thereof, termed Tracking the King which proves to be a novel quantum communication channel. The main topics, notions used are reviewed in an attempt to have the paper self contained.Comment: 8. arXiv admin note: substantial text overlap with arXiv:1206.3884, arXiv:1206.035

    Case studies and analysis of mine shafts incidents in Europe

    Get PDF
    International audienceEntry to mine workings is normally gained by means of vertical shafts or horizontal or inclined tunnels called adits. Other mining objects such as fan drifts and wheel pits are often associated with mine shafts. Such mining objects may or may not have been filled, wholly or partially, or otherwise sealed to prevent entry when the mine was abandoned. Nowadays mine entries are usually adequately protected on abandonment to prevent accidental ingress. Many earlier mine entries remain open, however, and may pose a threat to human safety. Within the framework of MISSTER (Mine shafts: improving security and new tools for the evaluation of risks), a European RFCS project (Research Fund for Coal and Steel), a selection of representative cases of mine shafts incidents was reviewed. This work was carried out by INERIS (France), GEOCONTROL (Spain), University of Nottingham and Mine Rescue Service Ltd (United Kingdom), Central Mining Institute and KWSA (Poland). The experience accumulated through this work will allow a fuller determination of risk scenarios associated with mine shafts

    Open access and open source in chemistry

    Get PDF
    Scientific data are being generated and shared at ever-increasing rates. Two new mechanisms for doing this have developed: open access publishing and open source research. We discuss both, with recent examples, highlighting the differences between the two, and the strengths of both

    Regularized fitted Q-iteration: application to planning

    Get PDF
    We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure

    Commercializing Biomedical Research Through Securitization Techniques

    Get PDF
    Biomedical innovation has become riskier, more expensive and more difficult to finance with traditional sources such as private and public equity. Here we propose a financial structure in which a large number of biomedical programs at various stages of development are funded by a single entity to substantially reduce the portfolio's risk. The portfolio entity can finance its activities by issuing debt, a critical advantage because a much larger pool of capital is available for investment in debt versus equity. By employing financial engineering techniques such as securitization, it can raise even greater amounts of more-patient capital. In a simulation using historical data for new molecular entities in oncology from 1990 to 2011, we find that megafunds of $5–15 billion may yield average investment returns of 8.9–11.4% for equity holders and 5–8% for 'research-backed obligation' holders, which are lower than typical venture-capital hurdle rates but attractive to pension funds, insurance companies and other large institutional investors

    Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

    Get PDF
    We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory
    • …
    corecore