4 research outputs found
Learning representations of multivariate time series with missing data
This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this recordLearning compressed representations of multivariate time series (MTS) facilitates data analysis in the presence of noise and redundant information, and for a large number of variates and time steps. However, classical dimensionality reduction approaches are designed for vectorial data and cannot deal explicitly with missing values. In this work, we propose a novel autoencoder architecture based on recurrent neural networks to generate compressed representations of MTS. The proposed model can process inputs characterized by variable lengths and it is specifically designed to handle missing data. Our autoencoder learns fixed-length vectorial representations, whose pairwise similarities are aligned to a kernel function that operates in input space and that handles missing values. This allows to learn good representations, even in the presence of a significant amount of missing data. To show the effectiveness of the proposed approach, we evaluate the quality of the learned representations in several classification tasks, including those involving medical data, and we compare to other methods for dimensionality reduction. Successively, we design two frameworks based on the proposed architecture: one for imputing missing data and another for one-class classification. Finally, we analyze under what circumstances an autoencoder with recurrent layers can learn better compressed representations of MTS than feed-forward architectures.Norwegian Research Counci
A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs
A large fraction of the electronic health records (EHRs) consists of clinical
measurements collected over time, such as lab tests and vital signs, which
provide important information about a patient's health status. These sequences
of clinical measurements are naturally represented as time series,
characterized by multiple variables and large amounts of missing data, which
complicate the analysis. In this work, we propose a novel kernel which is
capable of exploiting both the information from the observed values as well the
information hidden in the missing patterns in multivariate time series (MTS)
originating e.g. from EHRs. The kernel, called TCK, is designed using an
ensemble learning strategy in which the base models are novel mixed mode
Bayesian mixture models which can effectively exploit informative missingness
without having to resort to imputation methods. Moreover, the ensemble approach
ensures robustness to hyperparameters and therefore TCK is particularly
well suited if there is a lack of labels - a known challenge in medical
applications. Experiments on three real-world clinical datasets demonstrate the
effectiveness of the proposed kernel.Comment: 2020 International Workshop on Health Intelligence, AAAI-20. arXiv
admin note: text overlap with arXiv:1907.0525