1,193 research outputs found
Robust Losses for Learning Value Functions
Most value function learning algorithms in reinforcement learning are based
on the mean squared (projected) Bellman error. However, squared errors are
known to be sensitive to outliers, both skewing the solution of the objective
and resulting in high-magnitude and high-variance gradients. To control these
high-magnitude updates, typical strategies in RL involve clipping gradients,
clipping rewards, rescaling rewards, or clipping errors. While these strategies
appear to be related to robust losses -- like the Huber loss -- they are built
on semi-gradient update rules which do not minimize a known loss. In this work,
we build on recent insights reformulating squared Bellman errors as a
saddlepoint optimization problem and propose a saddlepoint reformulation for a
Huber Bellman error and Absolute Bellman error. We start from a formalization
of robust losses, then derive sound gradient-based approaches to minimize these
losses in both the online off-policy prediction and control settings. We
characterize the solutions of the robust losses, providing insight into the
problem settings where the robust losses define notably better solutions than
the mean squared Bellman error. Finally, we show that the resulting
gradient-based algorithms are more stable, for both prediction and control,
with less sensitivity to meta-parameters.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022
When is Offline Policy Selection Sample Efficient for Reinforcement Learning?
Offline reinforcement learning algorithms often require careful
hyperparameter tuning. Consequently, before deployment, we need to select
amongst a set of candidate policies. As yet, however, there is little
understanding about the fundamental limits of this offline policy selection
(OPS) problem. In this work we aim to provide clarity on when sample efficient
OPS is possible, primarily by connecting OPS to off-policy policy evaluation
(OPE) and Bellman error (BE) estimation. We first show a hardness result, that
in the worst case, OPS is just as hard as OPE, by proving a reduction of OPE to
OPS. As a result, no OPS method can be more sample efficient than OPE in the
worst case. We then propose a BE method for OPS, called Identifiable BE
Selection (IBES), that has a straightforward method for selecting its own
hyperparameters. We highlight that using IBES for OPS generally has more
requirements than OPE methods, but if satisfied, can be more sample efficient.
We conclude with an empirical study comparing OPE and IBES, and by showing the
difficulty of OPS on an offline Atari benchmark dataset
What You Need to Know about Bar-Code Medication Administration
Medication errors are the most common type of preventable error. Bar-code medication administration (BCMA) technology was designed to reduce medication administration errors. Poor system design, implementation and workarounds remain a cause of errors. This paper reviews the literature on BCMA, identifies a gap in the findings and identifies three evidence based practices that could be used to improve system implementation and reduce error. The literature review identified that Bar-code medication administration and system workarounds are well documented and affect patient safety. Based on the critical analysis of 10 studies, we identified gaps in the standardization of BCMA planning, implementation, and sustainability. The themes that emerged from the literature were poor BCMA design and implementation that resulted in workarounds.The three evidence based strategies proposed to address this gap are, evidence based standardization in planning and implementation, the identification and elimination of workarounds and hard wiring. An evidence based checklist evaluates compliance with standard procedures. The LEAN model of Jodoka is used to assure adaptation of the machine to human workflow. Direct observation provides valuable workflow assessment. An effective BCMA implementation involves careful system design, identification of workflow issues which cause workarounds, and adapting the machine to nursing needs
Reviews
1977 Tolkien Calendar. Greg and Tim Hildebrandt. Reviewed by Nancy-Lou Patterson.
The Lord of the Rings 1977 Calendar. Illustrations by J. R. R. Tolkien, notes by Christopher Tolkien. Reviewed by Nancy-Lou Patterson.
Adventure, Mystery, and Romance: Formula Stories as Art and Popular Culture. John G. Caweiti. Reviewed by Joe R. Christopher.
Encyclopedia of Mystery and Detection. Chris Steinbrunner and Otto Penzler (eds.). Reviewed by Joe R. Christopher.
The Father Christmas Letters. John Ronald Reuel Tolkien. Reviewed by Martha and Laurence Krieg.
The Middle-earth Song- book. Ruth Berman and Ken Nahigian (eds.). Reviewed by George Colvin.
From Elfland to Poughkeepsie. Ursula K. Le Guin. Reviewed by George Colvin.
Camber of Culdi. Katherine Kurtz. Reviewed by George Colvin
- …