72,064 research outputs found
Patient Similarity Analysis with Longitudinal Health Data
Healthcare professionals have long envisioned using the enormous processing
powers of computers to discover new facts and medical knowledge locked inside
electronic health records. These vast medical archives contain time-resolved
information about medical visits, tests and procedures, as well as outcomes,
which together form individual patient journeys. By assessing the similarities
among these journeys, it is possible to uncover clusters of common disease
trajectories with shared health outcomes. The assignment of patient journeys to
specific clusters may in turn serve as the basis for personalized outcome
prediction and treatment selection. This procedure is a non-trivial
computational problem, as it requires the comparison of patient data with
multi-dimensional and multi-modal features that are captured at different times
and resolutions. In this review, we provide a comprehensive overview of the
tools and methods that are used in patient similarity analysis with
longitudinal data and discuss its potential for improving clinical decision
making
Combining Representation Learning with Tensor Factorization for Risk Factor Analysis - an application to Epilepsy and Alzheimer's disease
Existing studies consider Alzheimer's disease (AD) a comorbidity of epilepsy,
but also recognize epilepsy to occur more frequently in patients with AD than
those without. The goal of this paper is to understand the relationship between
epilepsy and AD by studying causal relations among subgroups of epilepsy
patients. We develop an approach combining representation learning with tensor
factorization to provide an in-depth analysis of the risk factors among
epilepsy patients for AD. An epilepsy-AD cohort of ~600,000 patients were
extracted from Cerner Health Facts data (50M patients). Our experimental
results not only suggested a causal relationship between epilepsy and later
onset of AD ( p = 1.92e-51), but also identified five epilepsy subgroups with
distinct phenotypic patterns leading to AD. While such findings are
preliminary, the proposed method combining representation learning with tensor
factorization seems to be an effective approach for risk factor analysis
Clinically Meaningful Comparisons Over Time: An Approach to Measuring Patient Similarity based on Subsequence Alignment
Longitudinal patient data has the potential to improve clinical risk
stratification models for disease. However, chronic diseases that progress
slowly over time are often heterogeneous in their clinical presentation.
Patients may progress through disease stages at varying rates. This leads to
pathophysiological misalignment over time, making it difficult to consistently
compare patients in a clinically meaningful way. Furthermore, patients present
clinically for the first time at different stages of disease. This eliminates
the possibility of simply aligning patients based on their initial
presentation. Finally, patient data may be sampled at different rates due to
differences in schedules or missed visits. To address these challenges, we
propose a robust measure of patient similarity based on subsequence alignment.
Compared to global alignment techniques that do not account for
pathophysiological misalignment, focusing on the most relevant subsequences
allows for an accurate measure of similarity between patients. We demonstrate
the utility of our approach in settings where longitudinal data, while useful,
are limited and lack a clear temporal alignment for comparison. Applied to the
task of stratifying patients for risk of progression to probable Alzheimer's
Disease, our approach outperforms models that use only snapshot data (AUROC of
0.839 vs. 0.812) and models that use global alignment techniques (AUROC of
0.822). Our results support the hypothesis that patients' trajectories are
useful for quantifying inter-patient similarities and that using subsequence
matching and can help account for heterogeneity and misalignment in
longitudinal data
From Brain Imaging to Graph Analysis: a study on ADNI's patient cohort
In this paper, we studied the association between the change of structural
brain volumes to the potential development of Alzheimer's disease (AD). Using a
simple abstraction technique, we converted regional cortical and subcortical
volume differences over two time points for each study subject into a graph. We
then obtained substructures of interest using a graph decomposition algorithm
in order to extract pivotal nodes via multi-view feature selection. Intensive
experiments using robust classification frameworks were conducted to evaluate
the performance of using the brain substructures obtained under different
thresholds. The results indicated that compact substructures acquired by
examining the differences between patient groups were sufficient to
discriminate between AD and healthy controls with an area under the receiver
operating curve of 0.72
Integrative Analysis of Patient Health Records and Neuroimages via Memory-based Graph Convolutional Network
With the arrival of the big data era, more and more data are becoming readily
available in various real-world applications and those data are usually highly
heterogeneous. Taking computational medicine as an example, we have both
Electronic Health Records (EHR) and medical images for each patient. For
complicated diseases such as Parkinson's and Alzheimer's, both EHR and
neuroimaging information are very important for disease understanding because
they contain complementary aspects of the disease. However, EHR and neuroimage
are completely different. So far the existing research has been mainly focusing
on one of them. In this paper, we proposed a framework, Memory-Based Graph
Convolution Network (MemGCN), to perform integrative analysis with such
multi-modal data. Specifically, GCN is used to extract useful information from
the patients' neuroimages. The information contained in the patient EHRs before
the acquisition of each brain image is captured by a memory network because of
its sequential nature. The information contained in each brain image is
combined with the information read out from the memory network to infer the
disease state at the image acquisition timestamp. To further enhance the
analytical power of MemGCN, we also designed a multi-hop strategy that allows
multiple reading and updating on the memory can be performed at each iteration.
We conduct experiments using the patient data from the Parkinson's Progression
Markers Initiative (PPMI) with the task of classification of Parkinson's
Disease (PD) cases versus controls. We demonstrate that superior classification
performance can be achieved with our proposed framework, comparing with
existing approaches involving a single type of data
Integrating Hypertension Phenotype and Genotype with Hybrid Non-negative Matrix Factorization
Hypertension is a heterogeneous syndrome in need of improved subtyping using
phenotypic and genetic measurements so that patients in different subtypes
share similar pathophysiologic mechanisms and respond more uniformly to
targeted treatments. Existing machine learning approaches often face challenges
in integrating phenotype and genotype information and presenting to clinicians
an interpretable model. We aim to provide informed patient stratification by
introducing Hybrid Non-negative Matrix Factorization (HNMF) on phenotype and
genotype matrices. HNMF simultaneously approximates the phenotypic and genetic
matrices using different appropriate loss functions, and generates patient
subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF
approximates phenotypic matrix under Frobenius loss, and genetic matrix under
Kullback-Leibler (KL) loss. We propose an alternating projected gradient method
to solve the approximation problem. Simulation shows HNMF converges fast and
accurately to the true factor matrices. On real-world clinical dataset, we used
the patient factor matrix as features to predict main cardiac mechanistic
outcomes. We compared HNMF with six different models using phenotype or
genotype features alone, with or without NMF, or using joint NMF with only one
type of loss. HNMF significantly outperforms all comparison models. HNMF also
reveals intuitive phenotype-genotype interactions that characterize cardiac
abnormalities.Comment: fixed some presentation error
Uncovering Longitudinal Healthcare Utilization from Patient-Level Medical Claims Data
The objective of this study is to introduce methodology for studying
longitudinal claims data observed at the patient level, with inference on the
heterogeneity of healthcare utilization behaviors within large healthcare
systems such as Medicaid. The proposed approach is model-based, allowing for
visualization of longitudinal utilization behaviors using simple stochastic
graphical networks. The approach is general, providing a framework for the
study of other chronic conditions wherever longitudinal healthcare utilization
data are available. Our methods are inspired by and applied to patient-level
Medicaid claims for asthma-diagnosed children diagnosed observed over a period
of five years, with a comparison of two neighboring states, Georgia and North
Carolina
Multi-layer Trajectory Clustering: A Network Algorithm for Disease Subtyping
Many diseases display heterogeneity in clinical features and their
progression, indicative of the existence of disease subtypes. Extracting
patterns of disease variable progression for subtypes has tremendous
application in medicine, for example, in early prognosis and personalized
medical therapy. This work present a novel, data-driven, network-based
Trajectory Clustering (TC) algorithm for identifying Parkinson's subtypes based
on disease trajectory. Modeling patient-variable interactions as a bipartite
network, TC first extracts communities of co-expressing disease variables at
different stages of progression. Then, it identifies Parkinson's subtypes by
clustering similar patient trajectories that are characterized by severity of
disease variables through a multi-layer network. Determination of trajectory
similarity accounts for direct overlaps between trajectories as well as
second-order similarities, i.e., common overlap with a third set of
trajectories. This work clusters trajectories across two types of layers: (a)
temporal, and (b) ranges of independent outcome variable (representative of
disease severity), both of which yield four distinct subtypes. The former
subtypes exhibit differences in progression of disease domains (Cognitive,
Mental Health etc.), whereas the latter subtypes exhibit different degrees of
progression, i.e., some remain mild, whereas others show significant
deterioration after 5 years. The TC approach is validated through statistical
analyses and consistency of the identified subtypes with medical literature.
This generalizable and robust method can easily be extended to other
progressive multi-variate disease datasets, and can effectively assist in
targeted subtype-specific treatment in the field of personalized medicine.Comment: 20 pages, 8 figure
Similarity-based Random Survival Forest
Predicting time-to-event outcomes in large databases can be a challenging but
important task. One example of this is in predicting the time to a clinical
outcome for patients in intensive care units (ICUs), which helps to support
critical medical treatment decisions. In this context, the time to an event of
interest could be, for example, survival time or time to recovery from a
disease/ailment observed within the ICU. The massive health datasets generated
from the uptake of Electronic Health Records (EHRs) are quite heterogeneous as
patients can be quite dissimilar in their relationship between the feature
vector and the outcome, adding more noise than information to prediction. In
this paper, we propose a modified random forest method for survival data that
identifies similar cases in an attempt to improve accuracy for predicting
time-to-event outcomes; this methodology can be applied in various settings,
including with ICU databases. We also introduce an adaptation of our
methodology in the case of dependent censoring. Our proposed method is
demonstrated in the Medical Information Mart for Intensive Care (MIMIC-III)
database, and, in addition, we present properties of our methodology through a
comprehensive simulation study. Introducing similarity to the random survival
forest method indeed provides improved predictive accuracy compared to random
survival forest alone across the various analyses we undertook
Learning-Based Cost Functions for 3D and 4D Multi-Surface Multi-Object Segmentation of Knee MRI: Data from the Osteoarthritis Initiative
A fully automated knee MRI segmentation method to study osteoarthritis (OA)
was developed using a novel hierarchical set of random forests (RF) classifiers
to learn the appearance of cartilage regions and their boundaries. A
neighborhood approximation forest is used first to provide contextual feature
to the second-level RF classifier that also considers local features and
produces location-specific costs for the layered optimal graph image
segmentation of multiple objects and surfaces (LOGISMOS) framework. Double echo
steady state (DESS) MRIs used in this work originated from the Osteoarthritis
Initiative (OAI) study. Trained on 34 MRIs with varying degrees of OA, the
performance of the learning-based method tested on 108 MRIs showed a
significant reduction in segmentation errors (\emph{p}0.05) compared with
the conventional gradient-based and single-stage RF-learned costs. The 3D
LOGISMOS was extended to longitudinal-3D (4D) to simultaneously segment
multiple follow-up visits of the same patient. As such, data from all
time-points of the temporal sequence contribute information to a single optimal
solution that utilizes both spatial 3D and temporal contexts. 4D LOGISMOS
validation on 108 MRIs from baseline and 12 month follow-up scans of 54
patients showed a significant reduction in segmentation errors
(\emph{p}0.01) compared to 3D. Finally, the potential of 4D LOGISMOS was
further explored on the same 54 patients using 5 annual follow-up scans
demonstrating a significant improvement of measuring cartilage thickness
(\emph{p}0.01) compared to the sequential 3D approach.Comment: IEEE Transactions in Medical Imaging, 11 page
- …