1,549 research outputs found

    Training recurrent neural networks robust to incomplete data: application to Alzheimer's disease progression modeling

    Full text link
    Disease progression modeling (DPM) using longitudinal data is a challenging machine learning task. Existing DPM algorithms neglect temporal dependencies among measurements, make parametric assumptions about biomarker trajectories, do not model multiple biomarkers jointly, and need an alignment of subjects' trajectories. In this paper, recurrent neural networks (RNNs) are utilized to address these issues. However, in many cases, longitudinal cohorts contain incomplete data, which hinders the application of standard RNNs and requires a pre-processing step such as imputation of the missing values. Instead, we propose a generalized training rule for the most widely used RNN architecture, long short-term memory (LSTM) networks, that can handle both missing predictor and target values. The proposed LSTM algorithm is applied to model the progression of Alzheimer's disease (AD) using six volumetric magnetic resonance imaging (MRI) biomarkers, i.e., volumes of ventricles, hippocampus, whole brain, fusiform, middle temporal gyrus, and entorhinal cortex, and it is compared to standard LSTM networks with data imputation and a parametric, regression-based DPM method. The results show that the proposed algorithm achieves a significantly lower mean absolute error (MAE) than the alternatives with p < 0.05 using Wilcoxon signed rank test in predicting values of almost all of the MRI biomarkers. Moreover, a linear discriminant analysis (LDA) classifier applied to the predicted biomarker values produces a significantly larger AUC of 0.90 vs. at most 0.84 with p < 0.001 using McNemar's test for clinical diagnosis of AD. Inspection of MAE curves as a function of the amount of missing data reveals that the proposed LSTM algorithm achieves the best performance up until more than 74% missing values. Finally, it is illustrated how the method can successfully be applied to data with varying time intervals.Comment: arXiv admin note: substantial text overlap with arXiv:1808.0550

    Forecasting the Progression of Alzheimer's Disease Using Neural Networks and a Novel Pre-Processing Algorithm

    Get PDF
    Alzheimer's disease (AD) is the most common neurodegenerative disease in older people. Despite considerable efforts to find a cure for AD, there is a 99.6% failure rate of clinical trials for AD drugs, likely because AD patients cannot easily be identified at early stages. This project investigated machine learning approaches to predict the clinical state of patients in future years to benefit AD research. Clinical data from 1737 patients was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database and was processed using the "All-Pairs" technique, a novel methodology created for this project involving the comparison of all possible pairs of temporal data points for each patient. This data was then used to train various machine learning models. Models were evaluated using 7-fold cross-validation on the training dataset and confirmed using data from a separate testing dataset (110 patients). A neural network model was effective (mAUC = 0.866) at predicting the progression of AD on a month-by-month basis, both in patients who were initially cognitively normal and in patients suffering from mild cognitive impairment. Such a model could be used to identify patients at early stages of AD and who are therefore good candidates for clinical trials for AD therapeutics.Comment: 10 pages; updated acknowledgement

    Machine Learning for Multiclass Classification and Prediction of Alzheimer\u27s Disease

    Get PDF
    Alzheimer\u27s disease (AD) is an irreversible neurodegenerative disorder and a common form of dementia. This research aims to develop machine learning algorithms that diagnose and predict the progression of AD from multimodal heterogonous biomarkers with a focus placed on the early diagnosis. To meet this goal, several machine learning-based methods with their unique characteristics for feature extraction and automated classification, prediction, and visualization have been developed to discern subtle progression trends and predict the trajectory of disease progression. The methodology envisioned aims to enhance both the multiclass classification accuracy and prediction outcomes by effectively modeling the interplay between the multimodal biomarkers, handle the missing data challenge, and adequately extract all the relevant features that will be fed into the machine learning framework, all in order to understand the subtle changes that happen in the different stages of the disease. This research will also investigate the notion of multitasking to discover how the two processes of multiclass classification and prediction relate to one another in terms of the features they share and whether they could learn from one another for optimizing multiclass classification and prediction accuracy. This research work also delves into predicting cognitive scores of specific tests over time, using multimodal longitudinal data. The intent is to augment our prospects for analyzing the interplay between the different multimodal features used in the input space to the predicted cognitive scores. Moreover, the power of modality fusion, kernelization, and tensorization have also been investigated to efficiently extract important features hidden in the lower-dimensional feature space without being distracted by those deemed as irrelevant. With the adage that a picture is worth a thousand words, this dissertation introduces a unique color-coded visualization system with a fully integrated machine learning model for the enhanced diagnosis and prognosis of Alzheimer\u27s disease. The incentive here is to show that through visualization, the challenges imposed by both the variability and interrelatedness of the multimodal features could be overcome. Ultimately, this form of visualization via machine learning informs on the challenges faced with multiclass classification and adds insight into the decision-making process for a diagnosis and prognosis

    Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: A systematic literature review

    Get PDF
    OBJECTIVE: Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia. MATERIALS AND METHODS: We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus. RESULTS: There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values). DISCUSSION: Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research

    Deep Learning for Multiclass Classification, Predictive Modeling and Segmentation of Disease Prone Regions in Alzheimer’s Disease

    Get PDF
    One of the challenges facing accurate diagnosis and prognosis of Alzheimer’s Disease (AD) is identifying the subtle changes that define the early onset of the disease. This dissertation investigates three of the main challenges confronted when such subtle changes are to be identified in the most meaningful way. These are (1) the missing data challenge, (2) longitudinal modeling of disease progression, and (3) the segmentation and volumetric calculation of disease-prone brain areas in medical images. The scarcity of sufficient data compounded by the missing data challenge in many longitudinal samples exacerbates the problem as we seek statistical meaningfulness in multiclass classification and regression analysis. Although there are many participants in the AD Neuroimaging Initiative (ADNI) study, many of the observations have a lot of missing features which often lead to the exclusion of potentially valuable data points that could add significant meaning in many ongoing experiments. Motivated by the necessity of examining all participants, even those with missing tests or imaging modalities, multiple techniques of handling missing data in this domain have been explored. Specific attention was drawn to the Gradient Boosting (GB) algorithm which has an inherent capability of addressing missing values. Prior to applying state-of-the-art classifiers such as Support Vector Machine (SVM) and Random Forest (RF), the impact of imputing data in common datasets with numerical techniques has been also investigated and compared with the GB algorithm. Furthermore, to discriminate AD subjects from healthy control individuals, and Mild Cognitive Impairment (MCI), longitudinal multimodal heterogeneous data was modeled using recurring neural networks (RNNs). In the segmentation and volumetric calculation challenge, this dissertation places its focus on one of the most relevant disease-prone areas in many neurological and neurodegenerative diseases, the hippocampus region. Changes in hippocampus shape and volume are considered significant biomarkers for AD diagnosis and prognosis. Thus, a two-stage model based on integrating the Vision Transformer and Convolutional Neural Network (CNN) is developed to automatically locate, segment, and estimate the hippocampus volume from the brain 3D MRI. The proposed architecture was trained and tested on a dataset containing 195 brain MRIs from the 2019 Medical Segmentation Decathlon Challenge against the manually segmented regions provided therein and was deployed on 326 MRI from our own data collected through Mount Sinai Medical Center as part of the 1Florida Alzheimer Disease Research Center (ADRC)

    Robust Modeling and Prediction of Disease Progression Using Machine Learning

    Get PDF
    This work studies modeling the progression of Alzheimer’s disease using a parametric method robust to outliers and missing data and a nonparametric method robust to missing values and training instabilities. The proposed parametric method linearly maps the individual’s age to a disease progression score (DPS) and jointly fits constrained generalized logistic functions to the longitudinal dynamics of biomarkers as functions of the DPS using M-estimation. The proposed nonparametric method applies a generalized training rule based on normalizing the input and loss to the number of available data points to the long short-term memory (LSTM) recurrent neural networks to handle missing input and target values. Moreover, a robust initialization method is developed to address the training instability in LSTM networks based on a scaled random initialization of the network weights, aiming at preserving the variance of the network input and output in the same range. Both proposed methods are evaluated on data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) for robust modeling of volumetric magnetic resonance imaging (MRI) and positron emission tomography (PET) biomarkers, cerebrospinal fluid (CSF) measurements, as well as cognitive tests, and are compared to the state-of-the-art methods. The obtained results show that the proposed parametric model outperforms almost all state-of-the-art parametric methods in predicting biomarker values and classifying clinical status, and it generalizes well when applied to independent data from the National Alzheimer’s Coordinating Center (NACC). Additionally, the proposed generalized training rule for deep neural networks achieves superior results to standard LSTMs using data imputation before training, especially when applied to data with lower rates of missing values. A comprehensive analysis of the proposed methods in neurodegenerative disease progression modeling reveals that the proposed nonparametric method performs better than the proposed parametric method in predicting biomarker values, while the parametric method works significantly better in clinical status classification
    • …
    corecore