1,107 research outputs found
MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records
In recent years, increasingly augmentation of health data, such as patient
Electronic Health Records (EHR), are becoming readily available. This provides
an unprecedented opportunity for knowledge discovery and data mining algorithms
to dig insights from them, which can, later on, be helpful to the improvement
of the quality of care delivery. Predictive modeling of clinical risk,
including in-hospital mortality, hospital readmission, chronic disease onset,
condition exacerbation, etc., from patient EHR, is one of the health data
analytic problems that attract most of the interests. The reason is not only
because the problem is important in clinical settings, but also there are
challenges working with EHR such as sparsity, irregularity, temporality, etc.
Different from applications in other domains such as computer vision and
natural language processing, the labeled data samples in medicine (patients)
are relatively limited, which creates lots of troubles for effective predictive
model learning, especially for complicated models such as deep learning. In
this paper, we propose MetaPred, a meta-learning for clinical risk prediction
from longitudinal patient EHRs. In particular, in order to predict the target
risk where there are limited data samples, we train a meta-learner from a set
of related risk prediction tasks which learns how a good predictor is learned.
The meta-learned can then be directly used in target risk prediction, and the
limited available samples can be used for further fine-tuning the model
performance. The effectiveness of MetaPred is tested on a real patient EHR
repository from Oregon Health & Science University. We are able to demonstrate
that with CNN and RNN as base predictors, MetaPred can achieve much better
performance for predicting target risk with low resources comparing with the
predictor trained on the limited samples available for this risk
Representation Learning with Autoencoders for Electronic Health Records: A Comparative Study
Increasing volume of Electronic Health Records (EHR) in recent years provides
great opportunities for data scientists to collaborate on different aspects of
healthcare research by applying advanced analytics to these EHR clinical data.
A key requirement however is obtaining meaningful insights from high
dimensional, sparse and complex clinical data. Data science approaches
typically address this challenge by performing feature learning in order to
build more reliable and informative feature representations from clinical data
followed by supervised learning. In this paper, we propose a predictive
modeling approach based on deep learning based feature representations and word
embedding techniques. Our method uses different deep architectures (stacked
sparse autoencoders, deep belief network, adversarial autoencoders and
variational autoencoders) for feature representation in higher-level
abstraction to obtain effective and robust features from EHRs, and then build
prediction models on top of them. Our approach is particularly useful when the
unlabeled data is abundant whereas labeled data is scarce. We investigate the
performance of representation learning through a supervised learning approach.
Our focus is to present a comparative study to evaluate the performance of
different deep architectures through supervised learning and provide insights
in the choice of deep feature representation techniques. Our experiments
demonstrate that for small data sets, stacked sparse autoencoder demonstrates a
superior generality performance in prediction due to sparsity regularization
whereas variational autoencoders outperform the competing approaches for large
data sets due to its capability of learning the representation distribution
Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records
The rapid growth of Electronic Health Records (EHRs), as well as the
accompanied opportunities in Data-Driven Healthcare (DDH), has been attracting
widespread interests and attentions. Recent progress in the design and
applications of deep learning methods has shown promising results and is
forcing massive changes in healthcare academia and industry, but most of these
methods rely on massive labeled data. In this work, we propose a general deep
learning framework which is able to boost risk prediction performance with
limited EHR data. Our model takes a modified generative adversarial network
namely ehrGAN, which can provide plausible labeled EHR data by mimicking real
patient records, to augment the training dataset in a semi-supervised learning
manner. We use this generative model together with a convolutional neural
network (CNN) based prediction model to improve the onset prediction
performance. Experiments on two real healthcare datasets demonstrate that our
proposed framework produces realistic data samples and achieves significant
improvements on classification tasks with the generated data over several
stat-of-the-art baselines.Comment: To appear in ICDM 2017. This is the full version of paper with 8
page
Patient Flow Prediction via Discriminative Learning of Mutually-Correcting Processes
Over the past decade the rate of care unit (CU) use in the United States has
been increasing. With an aging population and ever-growing demand for medical
care, effective management of patients' transitions among different care
facilities will prove indispensible for shortening the length of hospital
stays, improving patient outcomes, allocating critical care resources, and
reducing preventable re-admissions. In this paper, we focus on an important
problem of predicting the so-called "patient flow" from longitudinal electronic
health records (EHRs), which has not been explored via existing machine
learning techniques. By treating a sequence of transition events as a point
process, we develop a novel framework for modeling patient flow through various
CUs and jointly predicting patients' destination CUs and duration days. Instead
of learning a generative point process model via maximum likelihood estimation,
we propose a novel discriminative learning algorithm aiming at improving the
prediction of transition events in the case of sparse data. By parameterizing
the proposed model as a mutually-correcting process, we formulate the
estimation problem via generalized linear models, which lends itself to
efficient learning based on alternating direction method of multipliers (ADMM).
Furthermore, we achieve simultaneous feature selection and learning by adding a
group-lasso regularizer to the ADMM algorithm. Additionally, for suppressing
the negative influence of data imbalance on the learning of model, we
synthesize auxiliary training data for the classes with extremely few samples,
and improve the robustness of our learning method accordingly. Testing on
real-world data, we show that our method obtains superior performance in terms
of accuracy of predicting the destination CU transition and duration of each CU
occupancy.Comment: in IEEE Transactions on Knowledge and Data Engineering (TKDE), 201
Co-Morbidity Exploration on Wearables Activity Data Using Unsupervised Pre-training and Multi-Task Learning
Physical activity and sleep play a major role in the prevention and
management of many chronic conditions. It is not a trivial task to understand
their impact on chronic conditions. Currently, data from electronic health
records (EHRs), sleep lab studies, and activity/sleep logs are used. The rapid
increase in the popularity of wearable health devices provides a significant
new data source, making it possible to track the user's lifestyle real-time
through web interfaces, both to consumer as well as their healthcare provider,
potentially. However, at present there is a gap between lifestyle data (e.g.,
sleep, physical activity) and clinical outcomes normally captured in EHRs. This
is a critical barrier for the use of this new source of signal for healthcare
decision making. Applying deep learning to wearables data provides a new
opportunity to overcome this barrier.
To address the problem of the unavailability of clinical data from a major
fraction of subjects and unrepresentative subject populations, we propose a
novel unsupervised (task-agnostic) time-series representation learning
technique called act2vec. act2vec learns useful features by taking into account
the co-occurrence of activity levels along with periodicity of human activity
patterns. The learned representations are then exploited to boost the
performance of disorder-specific supervised learning models. Furthermore, since
many disorders are often related to each other, a phenomenon referred to as
co-morbidity, we use a multi-task learning framework for exploiting the shared
structure of disorder inducing life-style choices partially captured in the
wearables data. Empirical evaluation using actigraphy data from 4,124 subjects
shows that our proposed method performs and generalizes substantially better
than the conventional time-series symbolic representational methods and
task-specific deep learning models
Integrative Analysis of Patient Health Records and Neuroimages via Memory-based Graph Convolutional Network
With the arrival of the big data era, more and more data are becoming readily
available in various real-world applications and those data are usually highly
heterogeneous. Taking computational medicine as an example, we have both
Electronic Health Records (EHR) and medical images for each patient. For
complicated diseases such as Parkinson's and Alzheimer's, both EHR and
neuroimaging information are very important for disease understanding because
they contain complementary aspects of the disease. However, EHR and neuroimage
are completely different. So far the existing research has been mainly focusing
on one of them. In this paper, we proposed a framework, Memory-Based Graph
Convolution Network (MemGCN), to perform integrative analysis with such
multi-modal data. Specifically, GCN is used to extract useful information from
the patients' neuroimages. The information contained in the patient EHRs before
the acquisition of each brain image is captured by a memory network because of
its sequential nature. The information contained in each brain image is
combined with the information read out from the memory network to infer the
disease state at the image acquisition timestamp. To further enhance the
analytical power of MemGCN, we also designed a multi-hop strategy that allows
multiple reading and updating on the memory can be performed at each iteration.
We conduct experiments using the patient data from the Parkinson's Progression
Markers Initiative (PPMI) with the task of classification of Parkinson's
Disease (PD) cases versus controls. We demonstrate that superior classification
performance can be achieved with our proposed framework, comparing with
existing approaches involving a single type of data
Time-Guided High-Order Attention Model of Longitudinal Heterogeneous Healthcare Data
Due to potential applications in chronic disease management and personalized
healthcare, the EHRs data analysis has attracted much attention of both
researchers and practitioners. There are three main challenges in modeling
longitudinal and heterogeneous EHRs data: heterogeneity, irregular temporality
and interpretability. A series of deep learning methods have made remarkable
progress in resolving these challenges. Nevertheless, most of existing
attention models rely on capturing the 1-order temporal dependencies or 2-order
multimodal relationships among feature elements. In this paper, we propose a
time-guided high-order attention (TGHOA) model. The proposed method has three
major advantages. (1) It can model longitudinal heterogeneous EHRs data via
capturing the 3-order correlations of different modalities and the irregular
temporal impact of historical events. (2) It can be used to identify the
potential concerns of medical features to explain the reasoning process of the
healthcare model. (3) It can be easily expanded into cases with more modalities
and flexibly applied in different prediction tasks. We evaluate the proposed
method in two tasks of mortality prediction and disease ranking on two real
world EHRs datasets. Extensive experimental results show the effectiveness of
the proposed model.Comment: PRICAI-201
Deep Representation Learning of Patient Data from Electronic Health Records (EHR): A Systematic Review
Patient representation learning refers to learning a dense mathematical
representation of a patient that encodes meaningful information from Electronic
Health Records (EHRs). This is generally performed using advanced deep learning
methods. This study presents a systematic review of this field and provides
both qualitative and quantitative analyses from a methodological perspective.
We identified studies developing patient representations from EHRs with deep
learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing
Machinery (ACM) Digital Library, and Institute of Electrical and Electronics
Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49
papers were included for a comprehensive data collection. We noticed a typical
workflow starting with feeding raw data, applying deep learning models, and
ending with clinical outcome predictions as evaluations of the learned
representations. Specifically, learning representations from structured EHR
data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely
applied as the deep learning architecture (LSTM: 13 studies, GRU: 11 studies).
Disease prediction was the most common application and evaluation (31 studies).
Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns
of EHR data, and code availability was assured in 20 studies. We show the
importance and feasibility of learning comprehensive representations of patient
EHR data through a systematic review. Advances in patient representation
learning techniques will be essential for powering patient-level EHR analyses.
Future work will still be devoted to leveraging the richness and potential of
available EHR data. Knowledge distillation and advanced learning techniques
will be exploited to assist the capability of learning patient representation
further
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Access to electronic health record (EHR) data has motivated computational
advances in medical research. However, various concerns, particularly over
privacy, can limit access to and collaborative use of EHR data. Sharing
synthetic EHR data could mitigate risk. In this paper, we propose a new
approach, medical Generative Adversarial Network (medGAN), to generate
realistic synthetic patient records. Based on input real patient records,
medGAN can generate high-dimensional discrete variables (e.g., binary and count
features) via a combination of an autoencoder and generative adversarial
networks. We also propose minibatch averaging to efficiently avoid mode
collapse, and increase the learning efficiency with batch normalization and
shortcut connections. To demonstrate feasibility, we showed that medGAN
generates synthetic patient records that achieve comparable performance to real
data on many experiments including distribution statistics, predictive modeling
tasks and a medical expert review. We also empirically observe a limited
privacy risk in both identity and attribute disclosure using medGAN.Comment: Accepted at Machine Learning in Health Care (MLHC) 201
Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review
Electronic health records (EHRs), digital collections of patient healthcare
events and observations, are ubiquitous in medicine and critical to healthcare
delivery, operations, and research. Despite this central role, EHRs are
notoriously difficult to process automatically. Well over half of the
information stored within EHRs is in the form of unstructured text (e.g.
provider notes, operation reports) and remains largely untapped for secondary
use. Recently, however, newer neural network and deep learning approaches to
Natural Language Processing (NLP) have made considerable advances,
outperforming traditional statistical and rule-based systems on a variety of
tasks. In this survey paper, we summarize current neural NLP methods for EHR
applications. We focus on a broad scope of tasks, namely, classification and
prediction, word embeddings, extraction, generation, and other topics such as
question answering, phenotyping, knowledge graphs, medical dialogue,
multilinguality, interpretability, etc.Comment: 33 pages, 11 figure
- …