27 research outputs found
A reinforcement learning design for HIV clinical trials
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014.Determining e ective treatment strategies for life-threatening illnesses such as HIV is
a signi cant problem in clinical research. Currently, HIV treatment involves using
combinations of anti-HIV drugs to inhibit the formation of drug-resistant strains. From
a clinician's perspective, this usually requires careful selection of drugs on the basis of an
individual's immune responses at a particular time. As the number of drugs available for
treatment increases, this task becomes di cult. In a clinical trial setting, the task is even
more challenging since experience using new drugs is limited. For these reasons, this
research examines whether machine learning techniques, and more speci cally batch
reinforcement learning, can be used for the purposes of determining the appropriate
treatment for an HIV-infected patient at a particular time. To do so, we consider using
tted Q-iteration with extremely randomized trees, neural tted Q-iteration and least
squares policy iteration. The use of batch reinforcement learning means that samples
of patient data are captured prior to learning to avoid imposing risks on a patient.
Because samples are re-used, these methods are data-e cient and particularly suited to
situations where large amounts of data are unavailable. We apply each of these learning
methods to both numerically generated and real data sets. Results from this research
highlight the advantages and disadvantages associated with each learning technique.
Real data testing has revealed that these batch reinforcement learning techniques have
the ability to suggest treatments that are reasonably consistent with those prescribed
by clinicians. The inclusion of additional state variables describing more about an
individual's health could further improve this learning process. Ultimately, the use of
such reinforcement learning methods could be coupled with a clinician's knowledge for
enhanced treatment design
Causal inference and interpretable machine learning for personalised medicine
In this thesis, we discuss the importance of causal knowledge in healthcare for tailoring treatments to a patient's needs. We propose three different causal models for reasoning about the effects of medical interventions on patients with HIV and sepsis, based on observational data. Both application areas are challenging as a result of patient heterogeneity and the existence of confounding that influences patient outcomes. Our first contribution is a treatment policy mixture model that combines nonparametric, kernel-based learning with model-based reinforcement learning to reason about a series of treatments and their effects. These methods each have their own strengths: non-parametric methods can accurately predict treatment effects where there are overlapping patient instances or where data is abundant; model-based reinforcement learning generalises better in outlier situations by learning a belief state representation of confounding. The overall policy mixture model learns a partition of the space of heterogeneous patients such that we can personalise treatments accordingly. Our second contribution incorporates knowledge from kernel-based reasoning directly into a reinforcement learning model by learning a combined belief state representation. In doing so, we can use the model to simulate counterfactual scenarios to reason about what would happen to a patient if we intervened in a particular way and how would their specific outcomes change. As a result, we may tailor therapies according to patient-specific scenarios.
Our third contribution is a reformulation of the information bottleneck problem for learning an interpretable, low-dimensional representation of confounding for medical decision-making. The approach uses the relevance of information to perform a sufficient reduction of confounding. Based on this reduction, we learn equivalence classes among groups of patients, such that we may transfer knowledge to patients with incomplete covariate information at test time. By conditioning on the sufficient statistic we can accurately infer treatment effects on both a population and subgroup level. Our final contribution is the development of a novel regularisation strategy that can be applied to deep machine learning models to enforce clinical interpretability. We specifically train deep time-series models such that their predictions have high accuracy while being closely modelled by small decision trees that can be audited easily by medical experts. Broadly, our tree-based explanations can be used to provide additional context in scenarios where reasoning about treatment effects may otherwise be difficult. Importantly, each of the models we present is an attempt to bring about more understanding in medical applications to inform better decision-making overall
Guarantee Regions for Local Explanations
Interpretability methods that utilise local surrogate models (e.g. LIME) are
very good at describing the behaviour of the predictive model at a point of
interest, but they are not guaranteed to extrapolate to the local region
surrounding the point. However, overfitting to the local curvature of the
predictive model and malicious tampering can significantly limit extrapolation.
We propose an anchor-based algorithm for identifying regions in which local
explanations are guaranteed to be correct by explicitly describing those
intervals along which the input features can be trusted. Our method produces an
interpretable feature-aligned box where the prediction of the local surrogate
model is guaranteed to match the predictive model. We demonstrate that our
algorithm can be used to find explanations with larger guarantee regions that
better cover the data manifold compared to existing baselines. We also show how
our method can identify misleading local explanations with significantly poorer
guarantee regions
Leveraging Factored Action Spaces for Off-Policy Evaluation
Off-policy evaluation (OPE) aims to estimate the benefit of following a
counterfactual sequence of actions, given data collected from executed
sequences. However, existing OPE estimators often exhibit high bias and high
variance in problems involving large, combinatorial action spaces. We
investigate how to mitigate this issue using factored action spaces i.e.
expressing each action as a combination of independent sub-actions from smaller
action spaces. This approach facilitates a finer-grained analysis of how
actions differ in their effects. In this work, we propose a new family of
"decomposed" importance sampling (IS) estimators based on factored action
spaces. Given certain assumptions on the underlying problem structure, we prove
that the decomposed IS estimators have less variance than their original
non-decomposed versions, while preserving the property of zero bias. Through
simulations, we empirically verify our theoretical results, probing the
validity of various assumptions. Provided with a technique that can derive the
action space factorisation for a given problem, our work shows that OPE can be
improved "for free" by utilising this inherent problem structure.Comment: Main paper: 8 pages, 7 figures. Appendix: 30 pages, 17 figures.
Accepted at ICML 2023 Workshop on Counterfactuals in Minds and Machines,
Honolulu, Hawaii, USA. Camera ready versio
Beyond Sparsity: Tree Regularization of Deep Models for Interpretability
The lack of interpretability remains a key barrier to the adoption of deep
models in many applications. In this work, we explicitly regularize deep models
so human users might step through the process behind their predictions in
little time. Specifically, we train deep time-series models so their
class-probability predictions have high accuracy while being closely modeled by
decision trees with few nodes. Using intuitive toy examples as well as medical
tasks for treating sepsis and HIV, we demonstrate that this new tree
regularization yields models that are easier for humans to simulate than
simpler L1 or L2 penalties without sacrificing predictive power.Comment: To appear in AAAI 2018. Contains 9-page main paper and appendix with
supplementary materia
Informed MCMC with Bayesian Neural Networks for Facial Image Analysis
Computer vision tasks are difficult because of the large variability in the
data that is induced by changes in light, background, partial occlusion as well
as the varying pose, texture, and shape of objects. Generative approaches to
computer vision allow us to overcome this difficulty by explicitly modeling the
physical image formation process. Using generative object models, the analysis
of an observed image is performed via Bayesian inference of the posterior
distribution. This conceptually simple approach tends to fail in practice
because of several difficulties stemming from sampling the posterior
distribution: high-dimensionality and multi-modality of the posterior
distribution as well as expensive simulation of the rendering process. The main
difficulty of sampling approaches in a computer vision context is choosing the
proposal distribution accurately so that maxima of the posterior are explored
early and the algorithm quickly converges to a valid image interpretation. In
this work, we propose to use a Bayesian Neural Network for estimating an image
dependent proposal distribution. Compared to a standard Gaussian random walk
proposal, this accelerates the sampler in finding regions of the posterior with
high value. In this way, we can significantly reduce the number of samples
needed to perform facial image analysis.Comment: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 201
Decision-Focused Model-based Reinforcement Learning for Reward Transfer
Decision-focused (DF) model-based reinforcement learning has recently been
introduced as a powerful algorithm that can focus on learning the MDP dynamics
that are most relevant for obtaining high returns. While this approach
increases the agent's performance by directly optimizing the reward, it does so
by learning less accurate dynamics from a maximum likelihood perspective. We
demonstrate that when the reward function is defined by preferences over
multiple objectives, the DF model may be sensitive to changes in the objective
preferences.In this work, we develop the robust decision-focused (RDF)
algorithm, which leverages the non-identifiability of DF solutions to learn
models that maximize expected returns while simultaneously learning models that
transfer to changes in the preference over multiple objectives. We demonstrate
the effectiveness of RDF on two synthetic domains and two healthcare
simulators, showing that it significantly improves the robustness of DF model
learning to changes in the reward function without compromising training-time
return