13 research outputs found

    Clinical Prediction from Structural Brain MRI Scans: A Large-Scale Empirical Study

    Get PDF
    Multivariate pattern analysis (MVPA) methods have become an important tool in neuroimaging, revealing complex associations and yielding powerful prediction models. Despite methodological developments and novel application domains, there has been little effort to compile benchmark results that researchers can reference and compare against. This study takes a significant step in this direction. We employed three classes of state-of-the-art MVPA algorithms and common types of structural measurements from brain Magnetic Resonance Imaging (MRI) scans to predict an array of clinically relevant variables (diagnosis of Alzheimer’s, schizophrenia, autism, and attention deficit and hyperactivity disorder; age, cerebrospinal fluid derived amyloid-β levels and mini-mental state exam score). We analyzed data from over 2,800 subjects, compiled from six publicly available datasets. The employed data and computational tools are freely distributed (https://www.nmr.mgh.harvard.edu/lab/mripredict), making this the largest, most comprehensive, reproducible benchmark image-based prediction experiment to date in structural neuroimaging. Finally, we make several observations regarding the factors that influence prediction performance and point to future research directions. Unsurprisingly, our results suggest that the biological footprint (effect size) has a dramatic influence on prediction performance. Though the choice of image measurement and MVPA algorithm can impact the result, there was no universally optimal selection. Intriguingly, the choice of algorithm seemed to be less critical than the choice of measurement type. Finally, our results showed that cross-validation estimates of performance, while generally optimistic, correlate well with generalization accuracy on a new dataset.BrightFocus Foundation (Alzheimer’s Disease pilot grant (AHAF A2012333))National Institutes of Health (U.S.) (K25 grant (NIBIB 1K25EB013649-01))National Center for Research Resources (U.S.) (U24 RR021382)National Institutes of Health. National Institute for Biomedical Imaging and Bioengineering (R01EB006758)National Institute of Neurological Disorders and Stroke (U.S.) (R01 NS052585-01, 1R21NS072652-01, 1R01NS070963, R01NS083534)National Institutes of Health (U.S.) (Blueprint for Neuroscience Research (5U01-MH093765)

    Generative-Discriminative Low Rank Decomposition for Medical Imaging Applications

    Get PDF
    In this thesis, we propose a method that can be used to extract biomarkers from medical images toward early diagnosis of abnormalities. Surge of demand for biomarkers and availability of medical images in the recent years call for accurate, repeatable, and interpretable approaches for extracting meaningful imaging features. However, extracting such information from medical images is a challenging task because the number of pixels (voxels) in a typical image is in order of millions while even a large sample-size in medical image dataset does not usually exceed a few hundred. Nevertheless, depending on the nature of an abnormality, only a parsimonious subset of voxels is typically relevant to the disease; therefore various notions of sparsity are exploited in this thesis to improve the generalization performance of the prediction task. We propose a novel discriminative dimensionality reduction method that yields good classification performance on various datasets without compromising the clinical interpretability of the results. This is achieved by combining the modelling strength of generative learning framework and the classification performance of discriminative learning paradigm. Clinical interpretability can be viewed as an additional measure of evaluation and is also helpful in designing methods that account for the clinical prior such as association of certain areas in a brain to a particular cognitive task or connectivity of some brain regions via neural fibres. We formulate our method as a large-scale optimization problem to solve a constrained matrix factorization. Finding an optimal solution of the large-scale matrix factorization renders off-the-shelf solver computationally prohibitive; therefore, we designed an efficient algorithm based on the proximal method to address the computational bottle-neck of the optimization problem. Our formulation is readily extended for different scenarios such as cases where a large cohort of subjects has uncertain or no class labels (semi-supervised learning) or a case where each subject has a battery of imaging channels (multi-channel), \etc. We show that by using various notions of sparsity as feasible sets of the optimization problem, we can encode different forms of prior knowledge ranging from brain parcellation to brain connectivity

    Automated detection of depression from brain structural magnetic resonance imaging (sMRI) scans

    Full text link
     Automated sMRI-based depression detection system is developed whose components include acquisition and preprocessing, feature extraction, feature selection, and classification. The core focus of the research is on the establishment of a new feature selection algorithm that quantifies the most relevant brain volumetric feature for depression detection at an individual level

    Machine Learning Methods for Structural Brain MRIs: Applications for Alzheimer’s Disease and Autism Spectrum Disorder

    Get PDF
    This thesis deals with the development of novel machine learning applications to automatically detect brain disorders based on magnetic resonance imaging (MRI) data, with a particular focus on Alzheimer’s disease and the autism spectrum disorder. Machine learning approaches are used extensively in neuroimaging studies of brain disorders to investigate abnormalities in various brain regions. However, there are many technical challenges in the analysis of neuroimaging data, for example, high dimensionality, the limited amount of data, and high variance in that data due to many confounding factors. These limitations make the development of appropriate computational approaches more challenging. To deal with these existing challenges, we target multiple machine learning approaches, including supervised and semi-supervised learning, domain adaptation, and dimensionality reduction methods.In the current study, we aim to construct effective biomarkers with sufficient sensitivity and specificity that can help physicians better understand the diseases and make improved diagnoses or treatment choices. The main contributions are 1) development of a novel biomarker for predicting Alzheimer’s disease in mild cognitive impairment patients by integrating structural MRI data and neuropsychological test results and 2) the development of a new computational approach for predicting disease severity in autistic patients in agglomerative data by automatically combining structural information obtained from different brain regions.In addition, we investigate various data-driven feature selection and classification methods for whole brain, voxel-based classification analysis of structural MRI and the use of semi-supervised learning approaches to predict Alzheimer’s disease. We also analyze the relationship between disease-related structural changes and cognitive states of patients with Alzheimer’s disease.The positive results of this effort provide insights into how to construct better biomarkers based on multisource data analysis of patient and healthy cohorts that may enable early diagnosis of brain disorders, detection of brain abnormalities and understanding effective processing in patient and healthy groups. Further, the methodologies and basic principles presented in this thesis are not only suited to the studied cases, but also are applicable to other similar problems

    Contributions to the study of Austism Spectrum Brain conectivity

    Get PDF
    164 p.Autism Spectrum Disorder (ASD) is a largely prevalent neurodevelopmental condition with a big social and economical impact affecting the entire life of families. There is an intense search for biomarkers that can be assessed as early as possible in order to initiate treatment and preparation of the family to deal with the challenges imposed by the condition. Brain imaging biomarkers have special interest. Specifically, functional connectivity data extracted from resting state functional magnetic resonance imaging (rs-fMRI) should allow to detect brain connectivity alterations. Machine learning pipelines encompass the estimation of the functional connectivity matrix from brain parcellations, feature extraction and building classification models for ASD prediction. The works reported in the literature are very heterogeneous from the computational and methodological point of view. In this Thesis we carry out a comprehensive computational exploration of the impact of the choices involved while building these machine learning pipelines

    Dealing with heterogeneity in the prediction of clinical diagnosis

    Full text link
    Le diagnostic assisté par ordinateur est un domaine de recherche en émergence et se situe à l’intersection de l’imagerie médicale et de l’apprentissage machine. Les données médi- cales sont de nature très hétérogène et nécessitent une attention particulière lorsque l’on veut entraîner des modèles de prédiction. Dans cette thèse, j’ai exploré deux sources d’hétérogénéité, soit l’agrégation multisites et l’hétérogénéité des étiquettes cliniques dans le contexte de l’imagerie par résonance magnétique (IRM) pour le diagnostic de la maladie d’Alzheimer (MA). La première partie de ce travail consiste en une introduction générale sur la MA, l’IRM et les défis de l’apprentissage machine en imagerie médicale. Dans la deuxième partie de ce travail, je présente les trois articles composant la thèse. Enfin, la troisième partie porte sur une discussion des contributions et perspectives fu- tures de ce travail de recherche. Le premier article de cette thèse montre que l’agrégation des données sur plusieurs sites d’acquisition entraîne une certaine perte, comparative- ment à l’analyse sur un seul site, qui tend à diminuer plus la taille de l’échantillon aug- mente. Le deuxième article de cette thèse examine la généralisabilité des modèles de prédiction à l’aide de divers schémas de validation croisée. Les résultats montrent que la formation et les essais sur le même ensemble de sites surestiment la précision du modèle, comparativement aux essais sur des nouveaux sites. J’ai également montré que l’entraînement sur un grand nombre de sites améliore la précision sur des nouveaux sites. Le troisième et dernier article porte sur l’hétérogénéité des étiquettes cliniques et pro- pose un nouveau cadre dans lequel il est possible d’identifier un sous-groupe d’individus qui partagent une signature homogène hautement prédictive de la démence liée à la MA. Cette signature se retrouve également chez les patients présentant des symptômes mod- érés. Les résultats montrent que 90% des sujets portant la signature ont progressé vers la démence en trois ans. Les travaux de cette thèse apportent ainsi de nouvelles con- tributions à la manière dont nous approchons l’hétérogénéité en diagnostic médical et proposent des pistes de solution pour tirer profit de cette hétérogénéité.Computer assisted diagnosis has emerged as a popular area of research at the intersection of medical imaging and machine learning. Medical data are very heterogeneous in nature and therefore require careful attention when one wants to train prediction models. In this thesis, I explored two sources of heterogeneity, multisite aggregation and clinical label heterogeneity, in an application of magnetic resonance imaging to the diagnosis of Alzheimer’s disease. In the process, I learned about the feasibility of multisite data aggregation and how to leverage that heterogeneity in order to improve generalizability of prediction models. Part one of the document is a general context introduction to Alzheimer’s disease, magnetic resonance imaging, and machine learning challenges in medical imaging. In part two, I present my research through three articles (two published and one in preparation). Finally, part three provides a discussion of my contributions and hints to possible future developments. The first article shows that data aggregation across multiple acquisition sites incurs some loss, compared to single site analysis, that tends to diminish as the sample size increase. These results were obtained through semisynthetic Monte-Carlo simulations based on real data. The second article investigates the generalizability of prediction models with various cross-validation schemes. I showed that training and testing on the same batch of sites over-estimates the accuracy of the model, compared to testing on unseen sites. However, I also showed that training on a large number of sites improves the accuracy on unseen sites. The third article, on clinical label heterogeneity, proposes a new framework where we can identify a subgroup of individuals that share a homogeneous signature highly predictive of AD dementia. That signature could also be found in patients with mild symptoms, 90% of whom progressed to dementia within three years. The thesis thus makes new contributions to dealing with heterogeneity in medical diagnostic applications and proposes ways to leverage that heterogeneity to our benefit

    MACHINE LEARNING BASED ANALYSIS AND COMPUTER AIDED CLASSIFICATION OF NEUROPSYCHIATRIC DISORDERS USING NEUROIMAGING

    Get PDF
    Machine learning (ML) based analysis of neuroimages in neuropsychiatry context are advancing the understanding of neurobiological profiles and the pathological bases of neuropsychiatric disorders. Computational analysis and investigations on features derived from structural magnetic resonance imaging (sMRI) of the brain are used to quantify morphological or anatomical characteristics of the different regions of the brain that have role in several distinct brain functions. This helps in the realization of anatomical underpinnings of those disorders that cause brain atrophy. Structural neuroimaging data acquired from schizophrenia (SCZ), bipolar disorder (BD) patients and people who experienced psychosis for the first time, are used for the experiments presented in this thesis. The cerebral cortex (i.e., gray matter) of the brain is one of the most studied anatomical part using 'cortical-average-thickness' distribution feature in the literature. This helps in the realization of the anatomical underpinning of those mental illnesses that cause brain atrophy. To this regard, based on statistical background, 'cortical-skewness' feature, a novel digital imaging-derived neuroanatomical biomarker that could potentially assist in the differentiation of healthy control (HC) and patient groups is proposed and tested in this thesis. The core theme of machine intelligence relies in extracting and learning patterns of input data from experience. Classification is one of the task. In a basic set up, ML algorithms are trained using exemplary multivariate data features and its associated class labels, so that they could be able to create models and do predictive classification and other tasks. Considering the conundrum nature of psychiatric disorders, researchers in the field, could benefit from ML based analysis of complex brain patterns. Out of many, one task is computer aided classification (CAC). This is achieved by training the algorithms, these complex brain patterns and their corresponding diagnostic statistics manual (DSM) based clinical gold standard labels. Indeed, in the literature, supervised learning methods such as support vector machines (SVM) which follow inductive learning strategy are widely exploited and achieved interesting results. Observing this and due to the fact that the most widely available relevant anatomical features of the cortex such as thickness and volume values, could not be considered satisfactory features because of the heterogeneous nature of the human brain anatomy due to differences in age, gender etc., a contextual similarity based learning is proposed. This learning uses a transductive learning mechanism (i.e, learn a specific function for the problem at hand) instead of learning a general function to solve a specific problem. Based on this, it is adopted, a formulation of a semi supervised graph transduction (label propagation) algorithm based on the notions of game theory, where the consistent labeling is represented with Nash equilibrium, to tackle the problem of learning from neuroimages with subtle microscopic difference among different clinical groups. However, since such kind of algorithms heavily rely on the graph structure of the extracted features, we extended the classification procedure by introducing a pre-training phase based on a distance metric learning strategy with the aim of enhancing the contextual similarity of the images by providing a 'must belong in the same class' and 'must not belong in the same class' constraint from the available training data. This would result to increase intra-class similarity and decrease inter-class similarity. The proposed classification pipeline is used for searching anatomical biomarkers. With the goal of identifying potential neuroanatomical markers of a psychiatric disorder, it is aimed to develop a feature selection strategy taking into consideration the widely exploited cortical thickness and the proposed skewness feature, with the objective of searching a combination of features from all cortical regions of the brain that could maximize the possible differentiation among the different clinical groups Considering Research Domain Criteria (RDoC) framework developed by National Institute of Mental Health (NIMH) with the aim of developing biologically valid perspective of mental disorders by integrating multimodal sources, clinical interview scores and neuroimaging data are used with ML methods to tackle the challenging problem of differential classification of BD vs. SCZ. Finally, as deep learning methods are emerging with remarkable results in several application domains, we adopted this class of methods especially convolutional neural networks (CNNs) with a 3D approach, to extract volumetric neuroanatomical markers. CAC of first episode psychosis (FEP) is performed by exploiting the 3D complex spatial structure of the brain to identify key regions of the brain associated with the pathophysiology of FEP. Testing of individualized predictions with big dataset of 855 structural scans to identify possible markers of the disease is performed
    corecore