1,478 research outputs found
Continuous State-Space Models for Optimal Sepsis Treatment - a Deep Reinforcement Learning Approach
Sepsis is a leading cause of mortality in intensive care units (ICUs) and
costs hospitals billions annually. Treating a septic patient is highly
challenging, because individual patients respond very differently to medical
interventions and there is no universally agreed-upon treatment for sepsis.
Understanding more about a patient's physiological state at a given time could
hold the key to effective treatment policies. In this work, we propose a new
approach to deduce optimal treatment policies for septic patients by using
continuous state-space models and deep reinforcement learning. Learning
treatment policies over continuous spaces is important, because we retain more
of the patient's physiological information. Our model is able to learn
clinically interpretable treatment policies, similar in important aspects to
the treatment policies of physicians. Evaluating our algorithm on past ICU
patient data, we find that our model could reduce patient mortality in the
hospital by up to 3.6% over observed clinical policies, from a baseline
mortality of 13.7%. The learned treatment policies could be used to aid
intensive care clinicians in medical decision making and improve the likelihood
of patient survival
Truly Batch Apprenticeship Learning with Deep Successor Features
We introduce a novel apprenticeship learning algorithm to learn an expert's
underlying reward structure in off-policy model-free \emph{batch} settings.
Unlike existing methods that require a dynamics model or additional data
acquisition for on-policy evaluation, our algorithm requires only the batch
data of observed expert behavior. Such settings are common in real-world
tasks---health care, finance or industrial processes ---where accurate
simulators do not exist or data acquisition is costly. To address challenges in
batch settings, we introduce Deep Successor Feature Networks(DSFN) that
estimate feature expectations in an off-policy setting and a
transition-regularized imitation network that produces a near-expert initial
policy and an efficient feature representation. Our algorithm achieves superior
results in batch settings on both control benchmarks and a vital clinical task
of sepsis management in the Intensive Care Unit.Comment: 10 pages, 3 figures, Under Conference Revie
Inverse Reinforcement Learning in Contextual MDPs
We consider the task of Inverse Reinforcement Learning in Contextual Markov
Decision Processes (MDPs). In this setting, contexts, which define the reward
and transition kernel, are sampled from a distribution. In addition, although
the reward is a function of the context, it is not provided to the agent.
Instead, the agent observes demonstrations from an optimal policy. The goal is
to learn the reward mapping, such that the agent will act optimally even when
encountering previously unseen contexts, also known as zero-shot transfer. We
formulate this problem as a non-differential convex optimization problem and
propose a novel algorithm to compute its subgradients. Based on this scheme, we
analyze several methods both theoretically, where we compare the sample
complexity and scalability, and empirically. Most importantly, we show both
theoretically and empirically that our algorithms perform zero-shot transfer
(generalize to new and unseen contexts). Specifically, we present empirical
experiments in a dynamic treatment regime, where the goal is to learn a reward
function which explains the behavior of expert physicians based on recorded
data of them treating patients diagnosed with sepsis
The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making
Off-policy reinforcement learning enables near-optimal policy from suboptimal
experience, thereby provisions opportunity for artificial intelligence
applications in healthcare. Previous works have mainly framed patient-clinician
interactions as Markov decision processes, while true physiological states are
not necessarily fully observable from clinical data. We capture this situation
with partially observable Markov decision process, in which an agent optimises
its actions in a belief represented as a distribution of patient states
inferred from individual history trajectories. A Gaussian mixture model is
fitted for the observed data. Moreover, we take into account the fact that
nuance in pharmaceutical dosage could presumably result in significantly
different effect by modelling a continuous policy through a Gaussian
approximator directly in the policy space, i.e. the actor. To address the
challenge of infinite number of possible belief states which renders exact
value iteration intractable, we evaluate and plan for only every encountered
belief, through heuristic search tree by tightly maintaining lower and upper
bounds of the true value of belief. We further resort to function
approximations to update value bounds estimation, i.e. the critic, so that the
tree search can be improved through more compact bounds at the fringe nodes
that will be back-propagated to the root. Both actor and critic parameters are
learned via gradient-based approaches. Our proposed policy trained from real
intensive care unit data is capable of dictating dosing on vasopressors and
intravenous fluids for sepsis patients that lead to the best patient outcomes
Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients
Glycemic control is essential for critical care. However, it is a challenging
task because there has been no study on personalized optimal strategies for
glycemic control. This work aims to learn personalized optimal glycemic
trajectories for severely ill septic patients by learning data-driven policies
to identify optimal targeted blood glucose levels as a reference for
clinicians. We encoded patient states using a sparse autoencoder and adopted a
reinforcement learning paradigm using policy iteration to learn the optimal
policy from data. We also estimated the expected return following the policy
learned from the recorded glycemic trajectories, which yielded a function
indicating the relationship between real blood glucose values and 90-day
mortality rates. This suggests that the learned optimal policy could reduce the
patients' estimated 90-day mortality rate by 6.3%, from 31% to 24.7%. The
result demonstrates that reinforcement learning with appropriate patient state
encoding can potentially provide optimal glycemic trajectories and allow
clinicians to design a personalized strategy for glycemic control in septic
patients.Comment: Accepted by the 31st Annual Conference on Neural Information
Processing Systems (NIPS 2017) Workshop on Machine Learning for Health (ML4H
Optimizing Sequential Medical Treatments with Auto-Encoding Heuristic Search in POMDPs
Health-related data is noisy and stochastic in implying the true
physiological states of patients, limiting information contained in
single-moment observations for sequential clinical decision making. We model
patient-clinician interactions as partially observable Markov decision
processes (POMDPs) and optimize sequential treatment based on belief states
inferred from history sequence. To facilitate inference, we build a variational
generative model and boost state representation with a recurrent neural network
(RNN), incorporating an auxiliary loss from sequence auto-encoding. Meanwhile,
we optimize a continuous policy of drug levels with an actor-critic method
where policy gradients are obtained from a stablized off-policy estimate of
advantage function, with the value of belief state backed up by parallel
best-first suffix trees. We exploit our methodology in optimizing dosages of
vasopressor and intravenous fluid for sepsis patients using a retrospective
intensive care dataset and evaluate the learned policy with off-policy policy
evaluation (OPPE). The results demonstrate that modelling as POMDPs yields
better performance than MDPs, and that incorporating heuristic search improves
sample efficiency
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
Owe to the recent advancements in Artificial Intelligence especially deep
learning, many data-driven decision support systems have been implemented to
facilitate medical doctors in delivering personalized care. We focus on the
deep reinforcement learning (DRL) models in this paper. DRL models have
demonstrated human-level or even superior performance in the tasks of computer
vision and game playings, such as Go and Atari game. However, the adoption of
deep reinforcement learning techniques in clinical decision optimization is
still rare. We present the first survey that summarizes reinforcement learning
algorithms with Deep Neural Networks (DNN) on clinical decision support. We
also discuss some case studies, where different DRL algorithms were applied to
address various clinical challenges. We further compare and contrast the
advantages and limitations of various DRL algorithms and present a preliminary
guide on how to choose the appropriate DRL algorithm for particular clinical
applications
Dynamic Measurement Scheduling for Adverse Event Forecasting using Deep RL
Current clinical practice to monitor patients' health follows either regular
or heuristic-based lab test (e.g. blood test) scheduling. Such practice not
only gives rise to redundant measurements accruing cost, but may even lead to
unnecessary patient discomfort. From the computational perspective,
heuristic-based test scheduling might lead to reduced accuracy of clinical
forecasting models. Computationally learning an optimal clinical test
scheduling and measurement collection, is likely to lead to both, better
predictive models and patient outcome improvement. We address the scheduling
problem using deep reinforcement learning (RL) to achieve high predictive gain
and low measurement cost, by scheduling fewer, but strategically timed tests.
We first show that in the simulation our policy outperforms heuristic-based
measurement scheduling with higher predictive gain or lower cost measured by
accumulated reward. We then learn a scheduling policy for mortality forecasting
in the real-world clinical dataset (MIMIC3), our learned policy is able to
provide useful clinical insights. To our knowledge, this is the first RL
application on multi-measurement scheduling problem in the clinical setting.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721
An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare
Reinforcement Learning (RL) has recently been applied to sequential
estimation and prediction problems identifying and developing hypothetical
treatment strategies for septic patients, with a particular focus on offline
learning with observational data. In practice, successful RL relies on
informative latent states derived from sequential observations to develop
optimal treatment strategies. To date, how best to construct such states in a
healthcare setting is an open question. In this paper, we perform an empirical
study of several information encoding architectures using data from septic
patients in the MIMIC-III dataset to form representations of a patient state.
We evaluate the impact of representation dimension, correlations with
established acuity scores, and the treatment policies derived from them. We
find that sequentially formed state representations facilitate effective policy
learning in batch settings, validating a more thoughtful approach to
representation learning that remains faithful to the sequential and partial
nature of healthcare data.Comment: To appear in proceedings of the 2020 Machine Learning for Health
workshop at NeurIP
Missingness as Stability: Understanding the Structure of Missingness in Longitudinal EHR data and its Impact on Reinforcement Learning in Healthcare
There is an emerging trend in the reinforcement learning for healthcare
literature. In order to prepare longitudinal, irregularly sampled, clinical
datasets for reinforcement learning algorithms, many researchers will resample
the time series data to short, regular intervals and use
last-observation-carried-forward (LOCF) imputation to fill in these gaps.
Typically, they will not maintain any explicit information about which values
were imputed. In this work, we (1) call attention to this practice and discuss
its potential implications; (2) propose an alternative representation of the
patient state that addresses some of these issues; and (3) demonstrate in a
novel but representative clinical dataset that our alternative representation
yields consistently better results for achieving optimal control, as measured
by off-policy policy evaluation, compared to representations that do not
incorporate missingness information.Comment: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended
Abstrac
- …