130 research outputs found

    Dealing with heterogeneity in the prediction of clinical diagnosis

    Full text link
    Le diagnostic assisté par ordinateur est un domaine de recherche en émergence et se situe à l’intersection de l’imagerie médicale et de l’apprentissage machine. Les données médi- cales sont de nature très hétérogène et nécessitent une attention particulière lorsque l’on veut entraîner des modèles de prédiction. Dans cette thèse, j’ai exploré deux sources d’hétérogénéité, soit l’agrégation multisites et l’hétérogénéité des étiquettes cliniques dans le contexte de l’imagerie par résonance magnétique (IRM) pour le diagnostic de la maladie d’Alzheimer (MA). La première partie de ce travail consiste en une introduction générale sur la MA, l’IRM et les défis de l’apprentissage machine en imagerie médicale. Dans la deuxième partie de ce travail, je présente les trois articles composant la thèse. Enfin, la troisième partie porte sur une discussion des contributions et perspectives fu- tures de ce travail de recherche. Le premier article de cette thèse montre que l’agrégation des données sur plusieurs sites d’acquisition entraîne une certaine perte, comparative- ment à l’analyse sur un seul site, qui tend à diminuer plus la taille de l’échantillon aug- mente. Le deuxième article de cette thèse examine la généralisabilité des modèles de prédiction à l’aide de divers schémas de validation croisée. Les résultats montrent que la formation et les essais sur le même ensemble de sites surestiment la précision du modèle, comparativement aux essais sur des nouveaux sites. J’ai également montré que l’entraînement sur un grand nombre de sites améliore la précision sur des nouveaux sites. Le troisième et dernier article porte sur l’hétérogénéité des étiquettes cliniques et pro- pose un nouveau cadre dans lequel il est possible d’identifier un sous-groupe d’individus qui partagent une signature homogène hautement prédictive de la démence liée à la MA. Cette signature se retrouve également chez les patients présentant des symptômes mod- érés. Les résultats montrent que 90% des sujets portant la signature ont progressé vers la démence en trois ans. Les travaux de cette thèse apportent ainsi de nouvelles con- tributions à la manière dont nous approchons l’hétérogénéité en diagnostic médical et proposent des pistes de solution pour tirer profit de cette hétérogénéité.Computer assisted diagnosis has emerged as a popular area of research at the intersection of medical imaging and machine learning. Medical data are very heterogeneous in nature and therefore require careful attention when one wants to train prediction models. In this thesis, I explored two sources of heterogeneity, multisite aggregation and clinical label heterogeneity, in an application of magnetic resonance imaging to the diagnosis of Alzheimer’s disease. In the process, I learned about the feasibility of multisite data aggregation and how to leverage that heterogeneity in order to improve generalizability of prediction models. Part one of the document is a general context introduction to Alzheimer’s disease, magnetic resonance imaging, and machine learning challenges in medical imaging. In part two, I present my research through three articles (two published and one in preparation). Finally, part three provides a discussion of my contributions and hints to possible future developments. The first article shows that data aggregation across multiple acquisition sites incurs some loss, compared to single site analysis, that tends to diminish as the sample size increase. These results were obtained through semisynthetic Monte-Carlo simulations based on real data. The second article investigates the generalizability of prediction models with various cross-validation schemes. I showed that training and testing on the same batch of sites over-estimates the accuracy of the model, compared to testing on unseen sites. However, I also showed that training on a large number of sites improves the accuracy on unseen sites. The third article, on clinical label heterogeneity, proposes a new framework where we can identify a subgroup of individuals that share a homogeneous signature highly predictive of AD dementia. That signature could also be found in patients with mild symptoms, 90% of whom progressed to dementia within three years. The thesis thus makes new contributions to dealing with heterogeneity in medical diagnostic applications and proposes ways to leverage that heterogeneity to our benefit

    Brain-based classification of youth with anxiety disorders: transdiagnostic examinations within the ENIGMA-Anxiety database using machine learning

    Get PDF
    Neuroanatomical findings on youth anxiety disorders are notoriously difficult to replicate, small in effect size, and have limited clinical relevance. These concerns have prompted a paradigm shift towards highly powered (i.e., big data) individual-level inferences, which are data-driven, transdiagnostic, and neurobiologically informed. Hence, we uniquely built/validated supervised neuroanatomical machine learning (ML) models for individual-level inferences, using the largest up to date neuroimaging database on youth anxiety disorders: ENIGMA Anxiety Consortium (N=3,343; Age: 10-25 years; Global Sites: 32). Modest, yet robust, brain-based classifications were achieved for specific anxiety disorders (Panic Disorder), but also transdiagnostically for all anxiety disorders when patients were subgrouped according to their sex, medication status, and symptom severity (AUC’s 0.59-0.63). Classifications were driven by neuroanatomical features (cortical thickness/surface area, subcortical volumes) in fronto-striato-limbic and temporo-parietal regions. This benchmark study provides estimates on individual-level classification performances that can be realistically achieved with ML using neuroanatomical data, within a large, heterogenous, and multi-site sample of youth with anxiety disorders

    Understanding Cognitive Variability in Alzheimer’s Disease

    Get PDF
    Alzheimer’s Disease (AD) is highly heterogenous, both clinically and biologically. This variability is exacerbated by the ways within which, the clinical presentation is assessed with cognitive measures. This inhibits clinical trial success and earlier diagnosis of individuals. Marrying the clinical presentation to the pathology of the disease has so far proved troublesome. This thesis will look at how cognitive measures can best capture the clinical presentation of AD and how these measures can link to the underlying pathology using machine learning methods. This thesis studied this problem across four analyses and two cohorts. Each study looked at a different aspect of cognitive testing within AD. This was done with the overarching aim to interrogate the cognitive variability across the spectrum of AD. Study 1 showed a novel discrepancy score is different to memory measures at screening for AD. It also showed it tracks with AD severity, in the same way memory recall does. Studies 2 & 3 uncovered broad psychometric variance within amnestic measurement of impairment due to AD. This was done in two different populations across two different constructs of amnestic measurement, story recall and verbal list learning. These tests are frequently used interchangeably. These two studies show they should not be. Finally, Study 4 built models from cognitive measures to predict AD pathology. The performance of these models was moderate showing that even with novel cognitive measures, further work is needed to link the clinical and amyloid related biological presentations of AD. Bridging the gap between clinical presentation and pathology of AD using clinical and cognitive markers alone is not possible. Even when using a novel measure of discrepancy score. The discrepancy measure shows promise but was limited due to the inability of the MMSE to measure verbal ability. Conceptually a discrepancy score remains a promising avenue of research for screening, but broader language measures, as well as other AD biomarkers are needed to further test the construct validity of this measure

    Large-scale inference in the focally damaged human brain

    Get PDF
    Clinical outcomes in focal brain injury reflect the interactions between two distinct anatomically distributed patterns: the functional organisation of the brain and the structural distribution of injury. The challenge of understanding the functional architecture of the brain is familiar; that of understanding the lesion architecture is barely acknowledged. Yet, models of the functional consequences of focal injury are critically dependent on our knowledge of both. The studies described in this thesis seek to show how machine learning-enabled high-dimensional multivariate analysis powered by large-scale data can enhance our ability to model the relation between focal brain injury and clinical outcomes across an array of modelling applications. All studies are conducted on internationally the largest available set of MR imaging data of focal brain injury in the context of acute stroke (N=1333) and employ kernel machines at the principal modelling architecture. First, I examine lesion-deficit prediction, quantifying the ceiling on achievable predictive fidelity for high-dimensional and low-dimensional models, demonstrating the former to be substantially higher than the latter. Second, I determine the marginal value of adding unlabelled imaging data to predictive models within a semi-supervised framework, quantifying the benefit of assembling unlabelled collections of clinical imaging. Third, I compare high- and low-dimensional approaches to modelling response to therapy in two contexts: quantifying the effect of treatment at the population level (therapeutic inference) and predicting the optimal treatment in an individual patient (prescriptive inference). I demonstrate the superiority of the high-dimensional approach in both settings

    Mass spectral imaging of clinical samples using deep learning

    Get PDF
    A better interpretation of tumour heterogeneity and variability is vital for the improvement of novel diagnostic techniques and personalized cancer treatments. Tumour tissue heterogeneity is characterized by biochemical heterogeneity, which can be investigated by unsupervised metabolomics. Mass Spectrometry Imaging (MSI) combined with Machine Learning techniques have generated increasing interest as analytical and diagnostic tools for the analysis of spatial molecular patterns in tissue samples. Considering the high complexity of data produced by the application of MSI, which can consist of many thousands of spectral peaks, statistical analysis and in particular machine learning and deep learning have been investigated as novel approaches to deduce the relationships between the measured molecular patterns and the local structural and biological properties of the tissues. Machine learning have historically been divided into two main categories: Supervised and Unsupervised learning. In MSI, supervised learning methods may be used to segment tissues into histologically relevant areas e.g. the classification of tissue regions in H&E (Haemotoxylin and Eosin) stained samples. Initial classification by an expert histopathologist, through visual inspection enables the development of univariate or multivariate models, based on tissue regions that have significantly up/down-regulated ions. However, complex data may result in underdetermined models, and alternative methods that can cope with high dimensionality and noisy data are required. Here, we describe, apply, and test a novel diagnostic procedure built using a combination of MSI and deep learning with the objective of delineating and identifying biochemical differences between cancerous and non-cancerous tissue in metastatic liver cancer and epithelial ovarian cancer. The workflow investigates the robustness of single (1D) to multidimensional (3D) tumour analyses and also highlights possible biomarkers which are not accessible from classical visual analysis of the H&E images. The identification of key molecular markers may provide a deeper understanding of tumour heterogeneity and potential targets for intervention.Open Acces

    Deep learning of brain asymmetry digital biomarkers to support early diagnosis of cognitive decline and dementia

    Get PDF
    Early identification of degenerative processes in the human brain is essential for proper care and treatment. This may involve different instrumental diagnostic methods, including the most popular computer tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET) scans. These technologies provide detailed information about the shape, size, and function of the human brain. Structural and functional cerebral changes can be detected by computational algorithms and used to diagnose dementia and its stages (amnestic early mild cognitive impairment - EMCI, Alzheimer’s Disease - AD). They can help monitor the progress of the disease. Transformation shifts in the degree of asymmetry between the left and right hemispheres illustrate the initialization or development of a pathological process in the brain. In this vein, this study proposes a new digital biomarker for the diagnosis of early dementia based on the detection of image asymmetries and crosssectional comparison of NC (normal cognitively), EMCI and AD subjects. Features of brain asymmetries extracted from MRI of the ADNI and OASIS databases are used to analyze structural brain changes and machine learning classification of the pathology. The experimental part of the study includes results of supervised machine learning algorithms and transfer learning architectures of convolutional neural networks for distinguishing between cognitively normal subjects and patients with early or progressive dementia. The proposed pipeline offers a low-cost imaging biomarker for the classification of dementia. It can be potentially helpful to other brain degenerative disorders accompanied by changes in brain asymmetries

    Explainable deep learning classifiers for disease detection based on structural brain MRI data

    Get PDF
    In dieser Doktorarbeit wird die Frage untersucht, wie erfolgreich deep learning bei der Diagnostik von neurodegenerativen Erkrankungen unterstützen kann. In 5 experimentellen Studien wird die Anwendung von Convolutional Neural Networks (CNNs) auf Daten der Magnetresonanztomographie (MRT) untersucht. Ein Schwerpunkt wird dabei auf die Erklärbarkeit der eigentlich intransparenten Modelle gelegt. Mit Hilfe von Methoden der erklärbaren künstlichen Intelligenz (KI) werden Heatmaps erstellt, die die Relevanz einzelner Bildbereiche für das Modell darstellen. Die 5 Studien dieser Dissertation zeigen das Potenzial von CNNs zur Krankheitserkennung auf neurologischen MRT, insbesondere bei der Kombination mit Methoden der erklärbaren KI. Mehrere Herausforderungen wurden in den Studien aufgezeigt und Lösungsansätze in den Experimenten evaluiert. Über alle Studien hinweg haben CNNs gute Klassifikationsgenauigkeiten erzielt und konnten durch den Vergleich von Heatmaps zur klinischen Literatur validiert werden. Weiterhin wurde eine neue CNN Architektur entwickelt, spezialisiert auf die räumlichen Eigenschaften von Gehirn MRT Bildern.Deep learning and especially convolutional neural networks (CNNs) have a high potential of being implemented into clinical decision support software for tasks such as diagnosis and prediction of disease courses. This thesis has studied the application of CNNs on structural MRI data for diagnosing neurological diseases. Specifically, multiple sclerosis and Alzheimer’s disease were used as classification targets due to their high prevalence, data availability and apparent biomarkers in structural MRI data. The classification task is challenging since pathology can be highly individual and difficult for human experts to detect and due to small sample sizes, which are caused by the high acquisition cost and sensitivity of medical imaging data. A roadblock in adopting CNNs to clinical practice is their lack of interpretability. Therefore, after optimizing the machine learning models for predictive performance (e.g. balanced accuracy), we have employed explainability methods to study the reliability and validity of the trained models. The deep learning models achieved good predictive performance of over 87% balanced accuracy on all tasks and the explainability heatmaps showed coherence with known clinical biomarkers for both disorders. Explainability methods were compared quantitatively using brain atlases and shortcomings regarding their robustness were revealed. Further investigations showed clear benefits of transfer-learning and image registration on the model performance. Lastly, a new CNN layer type was introduced, which incorporates a prior on the spatial homogeneity of neuro-MRI data. CNNs excel when used on natural images which possess spatial heterogeneity, and even though MRI data and natural images share computational similarities, the composition and orientation of neuro-MRI is very distinct. The introduced patch-individual filter (PIF) layer breaks the assumption of spatial invariance of CNNs and reduces convergence time on different data sets without reducing predictive performance. The presented work highlights many challenges that CNNs for disease diagnosis face on MRI data and defines as well as tests strategies to overcome those

    Toward diffusion tensor imaging as a biomarker in neurodegenerative diseases: technical considerations to optimize recordings and data processing

    Get PDF
    Neuroimaging biomarkers have shown high potential to map the disease processes in the application to neurodegenerative diseases (NDD), e.g., diffusion tensor imaging (DTI). For DTI, the implementation of a standardized scanning and analysis cascade in clinical trials has potential to be further optimized. Over the last few years, various approaches to improve DTI applications to NDD have been developed. The core issue of this review was to address considerations and limitations of DTI in NDD: we discuss suggestions for improvements of DTI applications to NDD. Based on this technical approach, a set of recommendations was proposed for a standardized DTI scan protocol and an analysis cascade of DTI data pre-and postprocessing and statistical analysis. In summary, considering advantages and limitations of the DTI in NDD we suggest improvements for a standardized framework for a DTI-based protocol to be applied to future imaging studies in NDD, towards the goal to proceed to establish DTI as a biomarker in clinical trials in neurodegeneration
    • …
    corecore