'Institute of Electrical and Electronics Engineers (IEEE)'
Abstract
Clinical time-series data retrieved from electronic
medical records are widely used to build predictive models of
adverse events to support resource management. Such data is
often sparse and irregularly-sampled, which makes it challenging
to use many common machine learning methods. Missing values
may be interpolated by carrying the last value forward, based
on pre-specified physiological normality ranges, or through linear
regression. Increasingly popular is the use of Gaussian process
(GP) regression for performing imputation, and often re-sampling
of time-series at regular intervals. However, the use of GPs can
require extensive, and likely adhoc, investigation to determine
model structure, such as an appropriate covariance function.
This can be challenging for multivariate real-world clinical data,
in which time-series variables exhibit different dynamics to one
another. In this work, we construct generative models to estimate
missing values in clinical time-series data using a neural latent
variable model, known as a Neural Process (NP). The NP model
employs a conditional prior distribution in the latent space to
learn global uncertainty in the data by modelling variations at
a local level. In contrast to conventional generative modeling,
such as via a GP, this prior is not fixed and is itself learned
during the training process. Thus, an NP model provides the
flexibility to adapt to the dynamics of the available clinical
data. We propose a variant of the NP framework for efficient
modelling of the mutual information between the latent and input
spaces, ensuring meaningful learned priors. Experiments using
the MIMIC III dataset are used to demonstrate the effectiveness
of the proposed approach as compared to conventional data
interpolation methods