1,193 research outputs found

    Robust Losses for Learning Value Functions

    Full text link
    Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses -- like the Huber loss -- they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings. We characterize the solutions of the robust losses, providing insight into the problem settings where the robust losses define notably better solutions than the mean squared Bellman error. Finally, we show that the resulting gradient-based algorithms are more stable, for both prediction and control, with less sensitivity to meta-parameters.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022

    When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

    Full text link
    Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connecting OPS to off-policy policy evaluation (OPE) and Bellman error (BE) estimation. We first show a hardness result, that in the worst case, OPS is just as hard as OPE, by proving a reduction of OPE to OPS. As a result, no OPS method can be more sample efficient than OPE in the worst case. We then propose a BE method for OPS, called Identifiable BE Selection (IBES), that has a straightforward method for selecting its own hyperparameters. We highlight that using IBES for OPS generally has more requirements than OPE methods, but if satisfied, can be more sample efficient. We conclude with an empirical study comparing OPE and IBES, and by showing the difficulty of OPS on an offline Atari benchmark dataset

    What You Need to Know about Bar-Code Medication Administration

    Get PDF
    Medication errors are the most common type of preventable error. Bar-code medication administration (BCMA) technology was designed to reduce medication administration errors. Poor system design, implementation and workarounds remain a cause of errors. This paper reviews the literature on BCMA, identifies a gap in the findings and identifies three evidence based practices that could be used to improve system implementation and reduce error. The literature review identified that Bar-code medication administration and system workarounds are well documented and affect patient safety. Based on the critical analysis of 10 studies, we identified gaps in the standardization of BCMA planning, implementation, and sustainability. The themes that emerged from the literature were poor BCMA design and implementation that resulted in workarounds.The three evidence based strategies proposed to address this gap are, evidence based standardization in planning and implementation, the identification and elimination of workarounds and hard wiring. An evidence based checklist evaluates compliance with standard procedures. The LEAN model of Jodoka is used to assure adaptation of the machine to human workflow. Direct observation provides valuable workflow assessment. An effective BCMA implementation involves careful system design, identification of workflow issues which cause workarounds, and adapting the machine to nursing needs

    Reviews

    Get PDF
    1977 Tolkien Calendar. Greg and Tim Hildebrandt. Reviewed by Nancy-Lou Patterson. The Lord of the Rings 1977 Calendar. Illustrations by J. R. R. Tolkien, notes by Christopher Tolkien. Reviewed by Nancy-Lou Patterson. Adventure, Mystery, and Romance: Formula Stories as Art and Popular Culture. John G. Caweiti. Reviewed by Joe R. Christopher. Encyclopedia of Mystery and Detection. Chris Steinbrunner and Otto Penzler (eds.). Reviewed by Joe R. Christopher. The Father Christmas Letters. John Ronald Reuel Tolkien. Reviewed by Martha and Laurence Krieg. The Middle-earth Song- book. Ruth Berman and Ken Nahigian (eds.). Reviewed by George Colvin. From Elfland to Poughkeepsie. Ursula K. Le Guin. Reviewed by George Colvin. Camber of Culdi. Katherine Kurtz. Reviewed by George Colvin
    corecore