70 research outputs found
Learning (Predictive) Risk Scores in the Presence of Censoring due to Interventions
A large and diverse set of measurements are regularly collected during a
patient's hospital stay to monitor their health status. Tools for integrating
these measurements into severity scores, that accurately track changes in
illness severity, can improve clinicians ability to provide timely
interventions. Existing approaches for creating such scores either 1) rely on
experts to fully specify the severity score, or 2) train a predictive score,
using supervised learning, by regressing against a surrogate marker of severity
such as the presence of downstream adverse events. The first approach does not
extend to diseases where an accurate score cannot be elicited from experts. The
second approach often produces scores that suffer from bias due to
treatment-related censoring (Paxton, 2013). We propose a novel ranking based
framework for disease severity score learning (DSSL). DSSL exploits the
following key observation: while it is challenging for experts to quantify the
disease severity at any given time, it is often easy to compare the disease
severity at two different times. Extending existing ranking algorithms, DSSL
learns a function that maps a vector of patient's measurements to a scalar
severity score such that the resulting score is temporally smooth and
consistent with the expert's ranking of pairs of disease states. We apply DSSL
to the problem of learning a sepsis severity score using a large, real-world
dataset. The learned scores significantly outperform state-of-the-art clinical
scores in ranking patient states by severity and in early detection of future
adverse events. We also show that the learned disease severity trajectories are
consistent with clinical expectations of disease evolution. Further, using
simulated datasets, we show that DSSL exhibits better generalization
performance to changes in treatment patterns compared to the above approaches
Tutorial: Safe and Reliable Machine Learning
This document serves as a brief overview of the "Safe and Reliable Machine
Learning" tutorial given at the 2019 ACM Conference on Fairness,
Accountability, and Transparency (FAT* 2019). The talk slides can be found
here: https://bit.ly/2Gfsukp, while a video of the talk is available here:
https://youtu.be/FGLOCkC4KmE, and a complete list of references for the
tutorial here: https://bit.ly/2GdLPme.Comment: Overview of the "Safe and Reliable Machine Learning" tutorial given
at the 2019 ACM Conference on Fairness, Accountability, and Transparency
(FAT* 2019
A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure
For many complex diseases, there is a wide variety of ways in which an
individual can manifest the disease. The challenge of personalized medicine is
to develop tools that can accurately predict the trajectory of an individual's
disease, which can in turn enable clinicians to optimize treatments. We
represent an individual's disease trajectory as a continuous-valued
continuous-time function describing the severity of the disease over time. We
propose a hierarchical latent variable model that individualizes predictions of
disease trajectories. This model shares statistical strength across
observations at different resolutions--the population, subpopulation and the
individual level. We describe an algorithm for learning population and
subpopulation parameters offline, and an online procedure for dynamically
learning individual-specific parameters. Finally, we validate our model on the
task of predicting the course of interstitial lung disease, a leading cause of
death among patients with the autoimmune disease scleroderma. We compare our
approach against state-of-the-art and demonstrate significant improvements in
predictive accuracy.Comment: Appeared in Neural Information Processing Systems (NIPS) 201
Reliable Decision Support using Counterfactual Models
Decision-makers are faced with the challenge of estimating what is likely to
happen when they take an action. For instance, if I choose not to treat this
patient, are they likely to die? Practitioners commonly use supervised learning
algorithms to fit predictive models that help decision-makers reason about
likely future outcomes, but we show that this approach is unreliable, and
sometimes even dangerous. The key issue is that supervised learning algorithms
are highly sensitive to the policy used to choose actions in the training data,
which causes the model to capture relationships that do not generalize. We
propose using a different learning objective that predicts counterfactuals
instead of predicting outcomes under an existing action policy as in supervised
learning. To support decision-making in temporal settings, we introduce the
Counterfactual Gaussian Process (CGP) to predict the counterfactual future
progression of continuous-time trajectories under sequences of future actions.
We demonstrate the benefits of the CGP on two important decision-support tasks:
risk prediction and "what if?" reasoning for individualized treatment planning.Comment: Published in the proceedings of Neural Information Processing Systems
(NIPS) 201
Discretizing Logged Interaction Data Biases Learning for Decision-Making
Time series data that are not measured at regular intervals are commonly
discretized as a preprocessing step. For example, data about customer arrival
times might be simplified by summing the number of arrivals within hourly
intervals, which produces a discrete-time time series that is easier to model.
In this abstract, we show that discretization introduces a bias that affects
models trained for decision-making. We refer to this phenomenon as
discretization bias, and show that we can avoid it by using continuous-time
models instead.Comment: This is a standalone short paper describing a new type of bias that
can arise when learning from time series data for sequential decision-making
problem
Trading-Off Cost of Deployment Versus Accuracy in Learning Predictive Models
Predictive models are finding an increasing number of applications in many
industries. As a result, a practical means for trading-off the cost of
deploying a model versus its effectiveness is needed. Our work is motivated by
risk prediction problems in healthcare. Cost-structures in domains such as
healthcare are quite complex, posing a significant challenge to existing
approaches. We propose a novel framework for designing cost-sensitive
structured regularizers that is suitable for problems with complex cost
dependencies. We draw upon a surprising connection to boolean circuits. In
particular, we represent the problem costs as a multi-layer boolean circuit,
and then use properties of boolean circuits to define an extended feature
vector and a group regularizer that exactly captures the underlying cost
structure. The resulting regularizer may then be combined with a fidelity
function to perform model prediction, for example. For the challenging
real-world application of risk prediction for sepsis in intensive care units,
the use of our regularizer leads to models that are in harmony with the
underlying cost structure and thus provide an excellent prediction accuracy
versus cost tradeoff.Comment: Authors contributed equally to this work. To appear in IJCAI 2016,
Twenty-Fifth International Joint Conference on Artificial Intelligence, 201
Reasoning at the Right Time Granularity
Most real-world dynamic systems are composed of different components that
often evolve at very different rates. In traditional temporal graphical models,
such as dynamic Bayesian networks, time is modeled at a fixed granularity,
generally selected based on the rate at which the fastest component evolves.
Inference must then be performed at this fastest granularity, potentially at
significant computational cost. Continuous Time Bayesian Networks (CTBNs) avoid
time-slicing in the representation by modeling the system as evolving
continuously over time. The expectation-propagation (EP) inference algorithm of
Nodelman et al. (2005) can then vary the inference granularity over time, but
the granularity is uniform across all parts of the system, and must be selected
in advance. In this paper, we provide a new EP algorithm that utilizes a
general cluster graph architecture where clusters contain distributions that
can overlap in both space (set of variables) and time. This architecture allows
different parts of the system to be modeled at very different time
granularities, according to their current rate of evolution. We also provide an
information-theoretic criterion for dynamically re-partitioning the clusters
during inference to tune the level of approximation to the current rate of
evolution. This avoids the need to hand-select the appropriate granularity, and
allows the granularity to adapt as information is transmitted across the
network. We present experiments demonstrating that this approach can result in
significant computational savings.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty
in Artificial Intelligence (UAI2007
Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction
Missing data and noisy observations pose significant challenges for reliably
predicting events from irregularly sampled multivariate time series
(longitudinal) data. Imputation methods, which are typically used for
completing the data prior to event prediction, lack a principled mechanism to
account for the uncertainty due to missingness. Alternatively, state-of-the-art
joint modeling techniques can be used for jointly modeling the longitudinal and
event data and compute event probabilities conditioned on the longitudinal
observations. These approaches, however, make strong parametric assumptions and
do not easily scale to multivariate signals with many observations. Our
proposed approach consists of several key innovations. First, we develop a
flexible and scalable joint model based upon sparse multiple-output Gaussian
processes. Unlike state-of-the-art joint models, the proposed model can explain
highly challenging structure including non-Gaussian noise while scaling to
large data. Second, we derive an optimal policy for predicting events using the
distribution of the event occurrence estimated by the joint model. The derived
policy trades-off the cost of a delayed detection versus incorrect assessments
and abstains from making decisions when the estimated event probability does
not satisfy the derived confidence criteria. Experiments on a large dataset
show that the proposed framework significantly outperforms state-of-the-art
techniques in event prediction.Comment: To appear in IEEE Transaction on Pattern Analysis and Machine
Intelligenc
Discovering shared and individual latent structure in multiple time series
This paper proposes a nonparametric Bayesian method for exploratory data
analysis and feature construction in continuous time series. Our method focuses
on understanding shared features in a set of time series that exhibit
significant individual variability. Our method builds on the framework of
latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet
processes, which allows us to characterize each series as switching between
latent ``topics'', where each topic is characterized as a distribution over
``words'' that specify the series dynamics. However, unlike standard
applications of LDA, we discover the words as we learn the model. We apply this
model to the task of tracking the physiological signals of premature infants;
our model obtains clinically significant insights as well as useful features
for supervised learning tasks.Comment: Additional supplementary section in tex fil
Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport
Classical supervised learning produces unreliable models when training and
target distributions differ, with most existing solutions requiring samples
from the target domain. We propose a proactive approach which learns a
relationship in the training domain that will generalize to the target domain
by incorporating prior knowledge of aspects of the data generating process that
are expected to differ as expressed in a causal selection diagram.
Specifically, we remove variables generated by unstable mechanisms from the
joint factorization to yield the Surgery Estimator---an interventional
distribution that is invariant to the differences across environments. We prove
that the surgery estimator finds stable relationships in strictly more
scenarios than previous approaches which only consider conditional
relationships, and demonstrate this in simulated experiments. We also evaluate
on real world data for which the true causal diagram is unknown, performing
competitively against entirely data-driven approaches.Comment: In Proceedings of the 22nd International Conference on Artificial
Intelligence and Statistics (AISTATS), 2019. Previously presented at the
NeurIPS 2018 Causal Learning Worksho
- …