Data pre-processing using neural processes for modelling personalised vital-sign time-series data

Abstract

Clinical time-series data retrieved from electronic medical records are widely used to build predictive models of adverse events to support resource management. Such data is often sparse and irregularly-sampled, which makes it challenging to use many common machine learning methods. Missing values may be interpolated by carrying the last value forward, based on pre-specified physiological normality ranges, or through linear regression. Increasingly popular is the use of Gaussian process (GP) regression for performing imputation, and often re-sampling of time-series at regular intervals. However, the use of GPs can require extensive, and likely adhoc, investigation to determine model structure, such as an appropriate covariance function. This can be challenging for multivariate real-world clinical data, in which time-series variables exhibit different dynamics to one another. In this work, we construct generative models to estimate missing values in clinical time-series data using a neural latent variable model, known as a Neural Process (NP). The NP model employs a conditional prior distribution in the latent space to learn global uncertainty in the data by modelling variations at a local level. In contrast to conventional generative modeling, such as via a GP, this prior is not fixed and is itself learned during the training process. Thus, an NP model provides the flexibility to adapt to the dynamics of the available clinical data. We propose a variant of the NP framework for efficient modelling of the mutual information between the latent and input spaces, ensuring meaningful learned priors. Experiments using the MIMIC III dataset are used to demonstrate the effectiveness of the proposed approach as compared to conventional data interpolation methods

    Similar works