Search CORE

1,478 research outputs found

Continuous State-Space Models for Optimal Sepsis Treatment - a Deep Reinforcement Learning Approach

Author: Celi Leo Anthony
Ghassemi Marzyeh
Komorowski Matthieu
Raghu Aniruddh
Szolovits Peter
Publication venue
Publication date: 23/05/2017
Field of study

Sepsis is a leading cause of mortality in intensive care units (ICUs) and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. Understanding more about a patient's physiological state at a given time could hold the key to effective treatment policies. In this work, we propose a new approach to deduce optimal treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Learning treatment policies over continuous spaces is important, because we retain more of the patient's physiological information. Our model is able to learn clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. Evaluating our algorithm on past ICU patient data, we find that our model could reduce patient mortality in the hospital by up to 3.6% over observed clinical policies, from a baseline mortality of 13.7%. The learned treatment policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival

arXiv.org e-Print Archive

Truly Batch Apprenticeship Learning with Deep Successor Features

Author: Doshi-Velez Finale
Lee Donghun
Srinivasan Srivatsan
Publication venue
Publication date: 24/03/2019
Field of study

We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free \emph{batch} settings. Unlike existing methods that require a dynamics model or additional data acquisition for on-policy evaluation, our algorithm requires only the batch data of observed expert behavior. Such settings are common in real-world tasks---health care, finance or industrial processes ---where accurate simulators do not exist or data acquisition is costly. To address challenges in batch settings, we introduce Deep Successor Feature Networks(DSFN) that estimate feature expectations in an off-policy setting and a transition-regularized imitation network that produces a near-expert initial policy and an efficient feature representation. Our algorithm achieves superior results in batch settings on both control benchmarks and a vital clinical task of sepsis management in the Intensive Care Unit.Comment: 10 pages, 3 figures, Under Conference Revie

arXiv.org e-Print Archive

Inverse Reinforcement Learning in Contextual MDPs

Author: Belogolovsky Stav
Korsunsky Philip
Mannor Shie
Tessler Chen
Zahavy Tom
Publication venue
Publication date: 30/12/2020
Field of study

We consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis

arXiv.org e-Print Archive

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

Author: Faisal Aldo A.
Komorowski Matthieu
Li Luchen
Publication venue
Publication date: 03/06/2018
Field of study

Off-policy reinforcement learning enables near-optimal policy from suboptimal experience, thereby provisions opportunity for artificial intelligence applications in healthcare. Previous works have mainly framed patient-clinician interactions as Markov decision processes, while true physiological states are not necessarily fully observable from clinical data. We capture this situation with partially observable Markov decision process, in which an agent optimises its actions in a belief represented as a distribution of patient states inferred from individual history trajectories. A Gaussian mixture model is fitted for the observed data. Moreover, we take into account the fact that nuance in pharmaceutical dosage could presumably result in significantly different effect by modelling a continuous policy through a Gaussian approximator directly in the policy space, i.e. the actor. To address the challenge of infinite number of possible belief states which renders exact value iteration intractable, we evaluate and plan for only every encountered belief, through heuristic search tree by tightly maintaining lower and upper bounds of the true value of belief. We further resort to function approximations to update value bounds estimation, i.e. the critic, so that the tree search can be improved through more compact bounds at the fringe nodes that will be back-propagated to the root. Both actor and critic parameters are learned via gradient-based approaches. Our proposed policy trained from real intensive care unit data is capable of dictating dosing on vasopressors and intravenous fluids for sepsis patients that lead to the best patient outcomes

arXiv.org e-Print Archive

Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients

Author: Gao Mingwu
He Ze
Szolovits Peter
Weng Wei-Hung
Yan Susu
Publication venue
Publication date: 02/12/2017
Field of study

Glycemic control is essential for critical care. However, it is a challenging task because there has been no study on personalized optimal strategies for glycemic control. This work aims to learn personalized optimal glycemic trajectories for severely ill septic patients by learning data-driven policies to identify optimal targeted blood glucose levels as a reference for clinicians. We encoded patient states using a sparse autoencoder and adopted a reinforcement learning paradigm using policy iteration to learn the optimal policy from data. We also estimated the expected return following the policy learned from the recorded glycemic trajectories, which yielded a function indicating the relationship between real blood glucose values and 90-day mortality rates. This suggests that the learned optimal policy could reduce the patients' estimated 90-day mortality rate by 6.3%, from 31% to 24.7%. The result demonstrates that reinforcement learning with appropriate patient state encoding can potentially provide optimal glycemic trajectories and allow clinicians to design a personalized strategy for glycemic control in septic patients.Comment: Accepted by the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017) Workshop on Machine Learning for Health (ML4H

arXiv.org e-Print Archive

Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs

Author: Faisal Aldo A.
Komorowski Matthieu
Li Luchen
Publication venue
Publication date: 17/05/2019
Field of study

Health-related data is noisy and stochastic in implying the true physiological states of patients, limiting information contained in single-moment observations for sequential clinical decision making. We model patient-clinician interactions as partially observable Markov decision processes (POMDPs) and optimize sequential treatment based on belief states inferred from history sequence. To facilitate inference, we build a variational generative model and boost state representation with a recurrent neural network (RNN), incorporating an auxiliary loss from sequence auto-encoding. Meanwhile, we optimize a continuous policy of drug levels with an actor-critic method where policy gradients are obtained from a stablized off-policy estimate of advantage function, with the value of belief state backed up by parallel best-first suffix trees. We exploit our methodology in optimizing dosages of vasopressor and intravenous fluid for sepsis patients using a retrospective intensive care dataset and evaluate the learned policy with off-policy policy evaluation (OPPE). The results demonstrate that modelling as POMDPs yields better performance than MDPs, and that incorporating heuristic search improves sample efficiency

arXiv.org e-Print Archive

Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey

Author: Feng Mengling
Liu Siqi
Ngiam Kee Yuan
Publication venue
Publication date: 22/07/2019
Field of study

Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications

arXiv.org e-Print Archive

Dynamic Measurement Scheduling for Adverse Event Forecasting using Deep RL

Author: Chang Chun-Hao
Goldenberg Anna
Mai Mingjie
Publication venue
Publication date: 01/12/2018
Field of study

Current clinical practice to monitor patients' health follows either regular or heuristic-based lab test (e.g. blood test) scheduling. Such practice not only gives rise to redundant measurements accruing cost, but may even lead to unnecessary patient discomfort. From the computational perspective, heuristic-based test scheduling might lead to reduced accuracy of clinical forecasting models. Computationally learning an optimal clinical test scheduling and measurement collection, is likely to lead to both, better predictive models and patient outcome improvement. We address the scheduling problem using deep reinforcement learning (RL) to achieve high predictive gain and low measurement cost, by scheduling fewer, but strategically timed tests. We first show that in the simulation our policy outperforms heuristic-based measurement scheduling with higher predictive gain or lower cost measured by accumulated reward. We then learn a scheduling policy for mortality forecasting in the real-world clinical dataset (MIMIC3), our learned policy is able to provide useful clinical insights. To our knowledge, this is the first RL application on multi-measurement scheduling problem in the clinical setting.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.0721

arXiv.org e-Print Archive

An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare

Author: Fatemi Mehdi
Ghassemi Marzyeh
Killian Taylor W.
Subramanian Jayakumar
Zhang Haoran
Publication venue
Publication date: 23/11/2020
Field of study

Reinforcement Learning (RL) has recently been applied to sequential estimation and prediction problems identifying and developing hypothetical treatment strategies for septic patients, with a particular focus on offline learning with observational data. In practice, successful RL relies on informative latent states derived from sequential observations to develop optimal treatment strategies. To date, how best to construct such states in a healthcare setting is an open question. In this paper, we perform an empirical study of several information encoding architectures using data from septic patients in the MIMIC-III dataset to form representations of a patient state. We evaluate the impact of representation dimension, correlations with established acuity scores, and the treatment policies derived from them. We find that sequentially formed state representations facilitate effective policy learning in batch settings, validating a more thoughtful approach to representation learning that remains faithful to the sequential and partial nature of healthcare data.Comment: To appear in proceedings of the 2020 Machine Learning for Health workshop at NeurIP

arXiv.org e-Print Archive

Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare

Author: Brunskill Emma
Ding Daisy
Duan Tony
Fleming Scott L.
Gombar Saurabh
Jeyapragasan Kuhan
Shah Nigam
Publication venue
Publication date: 16/11/2019
Field of study

There is an emerging trend in the reinforcement learning for healthcare literature. In order to prepare longitudinal, irregularly sampled, clinical datasets for reinforcement learning algorithms, many researchers will resample the time series data to short, regular intervals and use last-observation-carried-forward (LOCF) imputation to fill in these gaps. Typically, they will not maintain any explicit information about which values were imputed. In this work, we (1) call attention to this practice and discuss its potential implications; (2) propose an alternative representation of the patient state that addresses some of these issues; and (3) demonstrate in a novel but representative clinical dataset that our alternative representation yields consistently better results for achieving optimal control, as measured by off-policy policy evaluation, compared to representations that do not incorporate missingness information.Comment: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstrac

arXiv.org e-Print Archive