208 research outputs found

    Identifying Multimodal Intermediate Phenotypes between Genetic Risk Factors and Disease Status in Alzheimer’s Disease

    Get PDF
    Neuroimaging genetics has attracted growing attention and interest, which is thought to be a powerful strategy to examine the influence of genetic variants (i.e., single nucleotide polymorphisms (SNPs)) on structures or functions of human brain. In recent studies, univariate or multivariate regression analysis methods are typically used to capture the effective associations between genetic variants and quantitative traits (QTs) such as brain imaging phenotypes. The identified imaging QTs, although associated with certain genetic markers, may not be all disease specific. A useful, but underexplored, scenario could be to discover only those QTs associated with both genetic markers and disease status for revealing the chain from genotype to phenotype to symptom. In addition, multimodal brain imaging phenotypes are extracted from different perspectives and imaging markers consistently showing up in multimodalities may provide more insights for mechanistic understanding of diseases (i.e., Alzheimer’s disease (AD)). In this work, we propose a general framework to exploit multi-modal brain imaging phenotypes as intermediate traits that bridge genetic risk factors and multi-class disease status. We applied our proposed method to explore the relation between the well-known AD risk SNP APOE rs429358 and three baseline brain imaging modalities (i.e., structural magnetic resonance imaging (MRI), fluorodeoxyglucose positron emission tomography (FDG-PET) and F-18 florbetapir PET scans amyloid imaging (AV45)) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The empirical results demonstrate that our proposed method not only helps improve the performances of imaging genetic associations, but also discovers robust and consistent regions of interests (ROIs) across multi-modalities to guide the disease-induced interpretation

    Machine Learning for Multiclass Classification and Prediction of Alzheimer\u27s Disease

    Get PDF
    Alzheimer\u27s disease (AD) is an irreversible neurodegenerative disorder and a common form of dementia. This research aims to develop machine learning algorithms that diagnose and predict the progression of AD from multimodal heterogonous biomarkers with a focus placed on the early diagnosis. To meet this goal, several machine learning-based methods with their unique characteristics for feature extraction and automated classification, prediction, and visualization have been developed to discern subtle progression trends and predict the trajectory of disease progression. The methodology envisioned aims to enhance both the multiclass classification accuracy and prediction outcomes by effectively modeling the interplay between the multimodal biomarkers, handle the missing data challenge, and adequately extract all the relevant features that will be fed into the machine learning framework, all in order to understand the subtle changes that happen in the different stages of the disease. This research will also investigate the notion of multitasking to discover how the two processes of multiclass classification and prediction relate to one another in terms of the features they share and whether they could learn from one another for optimizing multiclass classification and prediction accuracy. This research work also delves into predicting cognitive scores of specific tests over time, using multimodal longitudinal data. The intent is to augment our prospects for analyzing the interplay between the different multimodal features used in the input space to the predicted cognitive scores. Moreover, the power of modality fusion, kernelization, and tensorization have also been investigated to efficiently extract important features hidden in the lower-dimensional feature space without being distracted by those deemed as irrelevant. With the adage that a picture is worth a thousand words, this dissertation introduces a unique color-coded visualization system with a fully integrated machine learning model for the enhanced diagnosis and prognosis of Alzheimer\u27s disease. The incentive here is to show that through visualization, the challenges imposed by both the variability and interrelatedness of the multimodal features could be overcome. Ultimately, this form of visualization via machine learning informs on the challenges faced with multiclass classification and adds insight into the decision-making process for a diagnosis and prognosis

    An overview of data integration in neuroscience with focus on Alzheimer's Disease

    Get PDF
    : This work represents the first attempt to provide an overview of how to face data integration as the result of a dialogue between neuroscientists and computer scientists. Indeed, data integration is fundamental for studying complex multifactorial diseases, such as the neurodegenerative diseases. This work aims at warning the readers of common pitfalls and critical issues in both medical and data science fields. In this context, we define a road map for data scientists when they first approach the issue of data integration in the biomedical domain, highlighting the challenges that inevitably emerge when dealing with heterogeneous, large-scale and noisy data and proposing possible solutions. Here, we discuss data collection and statistical analysis usually seen as parallel and independent processes, as cross-disciplinary activities. Finally, we provide an exemplary application of data integration to address Alzheimer's Disease (AD), which is the most common multifactorial form of dementia worldwide. We critically discuss the largest and most widely used datasets in AD, and demonstrate how the emergence of machine learning and deep learning methods has had a significant impact on disease's knowledge particularly in the perspective of an early AD diagnosis

    Predictive analytics applied to Alzheimer’s disease : a data visualisation framework for understanding current research and future challenges

    Get PDF
    Dissertation as a partial requirement for obtaining a master’s degree in information management, with a specialisation in Business Intelligence and Knowledge Management.Big Data is, nowadays, regarded as a tool for improving the healthcare sector in many areas, such as in its economic side, by trying to search for operational efficiency gaps, and in personalised treatment, by selecting the best drug for the patient, for instance. Data science can play a key role in identifying diseases in an early stage, or even when there are no signs of it, track its progress, quickly identify the efficacy of treatments and suggest alternative ones. Therefore, the prevention side of healthcare can be enhanced with the usage of state-of-the-art predictive big data analytics and machine learning methods, integrating the available, complex, heterogeneous, yet sparse, data from multiple sources, towards a better disease and pathology patterns identification. It can be applied for the diagnostic challenging neurodegenerative disorders; the identification of the patterns that trigger those disorders can make possible to identify more risk factors, biomarkers, in every human being. With that, we can improve the effectiveness of the medical interventions, helping people to stay healthy and active for a longer period. In this work, a review of the state of science about predictive big data analytics is done, concerning its application to Alzheimer’s Disease early diagnosis. It is done by searching and summarising the scientific articles published in respectable online sources, putting together all the information that is spread out in the world wide web, with the goal of enhancing knowledge management and collaboration practices about the topic. Furthermore, an interactive data visualisation tool to better manage and identify the scientific articles is develop, delivering, in this way, a holistic visual overview of the developments done in the important field of Alzheimer’s Disease diagnosis.Big Data é hoje considerada uma ferramenta para melhorar o sector da saúde em muitas áreas, tais como na sua vertente mais económica, tentando encontrar lacunas de eficiência operacional, e no tratamento personalizado, selecionando o melhor medicamento para o paciente, por exemplo. A ciência de dados pode desempenhar um papel fundamental na identificação de doenças em um estágio inicial, ou mesmo quando não há sinais dela, acompanhar o seu progresso, identificar rapidamente a eficácia dos tratamentos indicados ao paciente e sugerir alternativas. Portanto, o lado preventivo dos cuidados de saúde pode ser bastante melhorado com o uso de métodos avançados de análise preditiva com big data e de machine learning, integrando os dados disponíveis, geralmente complexos, heterogéneos e esparsos provenientes de múltiplas fontes, para uma melhor identificação de padrões patológicos e da doença. Estes métodos podem ser aplicados nas doenças neurodegenerativas que ainda são um grande desafio no seu diagnóstico; a identificação dos padrões que desencadeiam esses distúrbios pode possibilitar a identificação de mais fatores de risco, biomarcadores, em todo e qualquer ser humano. Com isso, podemos melhorar a eficácia das intervenções médicas, ajudando as pessoas a permanecerem saudáveis e ativas por um período mais longo. Neste trabalho, é feita uma revisão do estado da arte sobre a análise preditiva com big data, no que diz respeito à sua aplicação ao diagnóstico precoce da Doença de Alzheimer. Isto foi realizado através da pesquisa exaustiva e resumo de um grande número de artigos científicos publicados em fontes online de referência na área, reunindo a informação que está amplamente espalhada na world wide web, com o objetivo de aprimorar a gestão do conhecimento e as práticas de colaboração sobre o tema. Além disso, uma ferramenta interativa de visualização de dados para melhor gerir e identificar os artigos científicos foi desenvolvida, fornecendo, desta forma, uma visão holística dos avanços científico feitos no importante campo do diagnóstico da Doença de Alzheimer

    Combining heterogeneous data sources for neuroimaging based diagnosis: re-weighting and selecting what is important

    Get PDF
    Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with using the best source alone. In this paper, we proposed a framework based on a recent multiple kernel learning algorithm called EasyMKL and we investigated the benefits of this approach for diagnosing two different mental health diseases. The well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the Alzheimer Disease (AD) patients versus healthy controls classification task, and a second dataset tackling the task of classifying an heterogeneous group of depressed patients versus healthy controls. We used EasyMKL to combine a huge amount of basic kernels alongside a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our results show that the proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods

    Multimodal manifold-regularized transfer learning for MCI conversion prediction

    Get PDF
    As the early stage of Alzheimer's disease (AD), mild cognitive impairment (MCI) has high chance to convert to AD. Effective prediction of such conversion from MCI to AD is of great importance for early diagnosis of AD and also for evaluating AD risk pre-symptomatically. Unlike most previous methods that used only the samples from a target domain to train a classifier, in this paper, we propose a novel multimodal manifold-regularized transfer learning (M2TL) method that jointly utilizes samples from another domain (e.g., AD vs. normal controls (NC)) as well as unlabeled samples to boost the performance of the MCI conversion prediction. Specifically, the proposed M2TL method includes two key components. The first one is a kernel-based maximum mean discrepancy criterion, which helps eliminate the potential negative effect induced by the distributional difference between the auxiliary domain (i.e., AD and NC) and the target domain (i.e., MCI converters (MCI-C) and MCI non-converters (MCI-NC)). The second one is a semi-supervised multimodal manifold-regularized least squares classification method, where the target-domain samples, the auxiliary-domain samples, and the unlabeled samples can be jointly used for training our classifier. Furthermore, with the integration of a group sparsity constraint into our objective function, the proposed M2TL has a capability of selecting the informative samples to build a robust classifier. Experimental results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database validate the effectiveness of the proposed method by significantly improving the classification accuracy of 80.1 % for MCI conversion prediction, and also outperforming the state-of-the-art methods

    Multiple kernel learning with random effects for predicting longitudinal outcomes and data integration

    Get PDF
    Predicting disease risk and progression is one of the main goals in many clinical research studies. Cohort studies on the natural history and etiology of chronic diseases span years and data are collected at multiple visits. Although kernel-based statistical learning methods are proven to be powerful for a wide range of disease prediction problems, these methods are only well studied for independent data but not for longitudinal data. It is thus important to develop time-sensitive prediction rules that make use of the longitudinal nature of the data. In this paper, we develop a novel statistical learning method for longitudinal data by introducing subject-specific short-term and long-term latent effects through a designed kernel to account for within-subject correlation of longitudinal measurements. Since the presence of multiple sources of data is increasingly common, we embed our method in a multiple kernel learning framework and propose a regularized multiple kernel statistical learning with random effects to construct effective nonparametric prediction rules. Our method allows easy integration of various heterogeneous data sources and takes advantage of correlation among longitudinal measures to increase prediction power. We use different kernels for each data source taking advantage of the distinctive feature of each data modality, and then optimally combine data across modalities. We apply the developed methods to two large epidemiological studies, one on Huntington's disease and the other on Alzheimer's Disease (Alzheimer's Disease Neuroimaging Initiative, ADNI) where we explore a unique opportunity to combine imaging and genetic data to study prediction of mild cognitive impairment, and show a substantial gain in performance while accounting for the longitudinal aspect of the data

    Associations between polygenic risk scores for four psychiatric illnesses and brain structure using multivariate pattern recognition

    Get PDF
    Psychiatric illnesses are complex and polygenic. They are associated with widespread alterations in the brain, which are partly influenced by genetic factors. There have been some attempts to relate polygenic risk scores (PRS) - a measure of the overall genetic risk an individual carries for a disorder - to brain structure using univariate methods. However, PRS are likely associated with distributed and covarying effects across the brain. We therefore used multivariate machine learning in this proof-of-principle study to investigate associations between brain structure and PRS for four psychiatric disorders; attention deficit-hyperactivity disorder (ADHD), autism, bipolar disorder and schizophrenia. The sample included 213 individuals comprising patients with depression (69), bipolar disorder (33), and healthy controls (111). The five psychiatric PRSs were calculated based on summary data from the Psychiatric Genomics Consortium. T1-weighted magnetic resonance images were obtained and voxel-based morphometry was implemented in SPM12. Multivariate relevance vector regression was implemented in the Pattern Recognition for Neuroimaging Toolbox (PRoNTo). Across the whole sample, a multivariate pattern of grey matter significantly predicted the PRS for autism (r = 0.20, pFDR = 0.03; MSE = 4.20 × 10-5, pFDR = 0.02). For the schizophrenia PRS, the MSE was significant (MSE = 1.30 × 10-5, pFDR = 0.02) although the correlation was not (r = 0.15, pFDR = 0.06). These results lend support to the hypothesis that polygenic liability for autism and schizophrenia is associated with widespread changes in grey matter concentrations. These associations were seen in individuals not affected by these disorders, indicating that this is not driven by the expression of the disease, but by the genetic risk captured by the PRSs

    DEEP-AD: The deep learning model for diagnostic classification and prognostic prediction of alzheimer's disease

    Get PDF
    In terms of context, the aim of this dissertation is to aid neuroradiologists in their clinical judgment regarding the early detection of AD by using DL. To that aim, the system design research methodology is suggested in this dissertation for achieving three goals. The first goal is to investigate the DL models that have performed well at identifying patterns associated with AD, as well as the accuracy so far attained, limitations, and gaps. A systematic review of the literature (SLR) revealed a shortage of empirical studies on the early identification of AD through DL. In this regard, thirteen empirical studies were identified and examined. We concluded that three-dimensional (3D) DL models have been generated far less often and that their performance is also inadequate to qualify them for clinical trials. The second goal is to provide the neuroradiologist with the computer-interpretable information they need to analyze neuroimaging biomarkers. Given this context, the next step in this dissertation is to find the optimum DL model to analyze neuroimaging biomarkers. It has been achieved in two steps. In the first step, eight state-of-the-art DL models have been implemented by training from scratch using end-to-end learning (E2EL) for two binary classification tasks (AD vs. CN and AD vs. stable MCI) and compared by utilizing MRI scans from the publicly accessible datasets of neuroimaging biomarkers. Comparative analysis is carried out by utilizing efficiency-effects graphs, comprehensive indicators, and ranking mechanisms. For the training of the AD vs. sMCI task, the EfficientNet-B0 model gets the highest value for the comprehensive indicator and has the fewest parameters. DenseNet264 performed better than the others in terms of evaluation matrices, but since it has the most parameters, it costs more to train. For the AD vs. CN task by DenseNet264, we achieved 100% accuracy for training and 99.56% accuracy for testing. However, the classification accuracy was still only 82.5% for the AD vs. sMCI task. In the second step, fusion of transfer learning (TL) with E2EL is applied to train the EfficientNet-B0 for the AD vs. sMCI task, which achieved 95.29% accuracy for training and 93.10% accuracy for testing. Additionally, we have also implemented EfficientNet-B0 for the multiclass AD vs. CN vs. sMCI classification task with E2EL to be used in ensemble of models and achieved 85.66% training accuracy and 87.38% testing accuracy. To evaluate the model’s robustness, neuroradiologists must validate the implemented model. As a result, the third goal of this dissertation is to create a tool that neuroradiologists may use at their convenience. To achieve this objective, this dissertation proposes a web-based application (DEEP-AD) that has been created by making an ensemble of Efficient-Net B0 and DenseNet 264 (based on the contribution of goal 2). The accuracy of a DEEP-AD prototype has undergone repeated evaluation and improvement. First, we validated 41 subjects of Spanish MRI datasets (acquired from HT Medica, Madrid, Spain), achieving an accuracy of 82.90%, which was later verified by neuroradiologists. The results of these evaluation studies showed the accomplishment of such goals and relevant directions for future research in applied DL for the early detection of AD in clinical settings.En términos de contexto, el objetivo de esta tesis es ayudar a los neurorradiólogos en su juicio clínico sobre la detección precoz de la AD mediante el uso de DL. Para ello, en esta tesis se propone la metodología de investigación de diseño de sistemas para lograr tres objetivos. El segundo objetivo es proporcionar al neurorradiólogo la información interpretable por ordenador que necesita para analizar los biomarcadores de neuroimagen. Dado este contexto, el siguiente paso en esta tesis es encontrar el modelo DL óptimo para analizar biomarcadores de neuroimagen. Esto se ha logrado en dos pasos. En el primer paso, se han implementado ocho modelos DL de última generación mediante entrenamiento desde cero utilizando aprendizaje de extremo a extremo (E2EL) para dos tareas de clasificación binarias (AD vs. CN y AD vs. MCI estable) y se han comparado utilizando escaneos MRI de los conjuntos de datos de biomarcadores de neuroimagen de acceso público. El análisis comparativo se lleva a cabo utilizando gráficos de efecto-eficacia, indicadores exhaustivos y mecanismos de clasificación. Para el entrenamiento de la tarea AD vs. sMCI, el modelo EfficientNet-B0 obtiene el valor más alto para el indicador exhaustivo y tiene el menor número de parámetros. DenseNet264 obtuvo mejores resultados que los demás en términos de matrices de evaluación, pero al ser el que tiene más parámetros, su entrenamiento es más costoso. Para la tarea AD vs. CN de DenseNet264, conseguimos una accuracy del 100% en el entrenamiento y del 99,56% en las pruebas. Sin embargo, la accuracy de la clasificación fue sólo del 82,5% para la tarea AD vs. sMCI. En el segundo paso, se aplica la fusión del aprendizaje por transferencia (TL) con E2EL para entrenar la EfficientNet-B0 para la tarea AD vs. sMCI, que alcanzó una accuracy del 95,29% en el entrenamiento y del 93,10% en las pruebas. Además, también hemos implementado EfficientNet-B0 para la tarea de clasificación multiclase AD vs. CN vs. sMCI con E2EL para su uso en conjuntos de modelos y hemos obtenido una accuracy de entrenamiento del 85,66% y una precisión de prueba del 87,38%. Para evaluar la solidez del modelo, los neurorradiólogos deben validar el modelo implementado. Como resultado, el tercer objetivo de esta disertación es crear una herramienta que los neurorradiólogos puedan utilizar a su conveniencia. Para lograr este objetivo, esta disertación propone una aplicación basada en web (DEEP-AD) que ha sido creada haciendo un ensemble de Efficient-Net B0 y DenseNet 264 (basado en la contribución del objetivo 2). La accuracy del prototipo DEEP-AD ha sido sometida a repetidas evaluaciones y mejoras. En primer lugar, validamos 41 sujetos de conjuntos de datos de MRI españoles (adquiridos de HT Medica, Madrid, España), logrando una accuracy del 82,90%, que posteriormente fue verificada por neurorradiólogos. Los resultados de estos estudios de evaluación mostraron el cumplimiento de dichos objetivos y las direcciones relevantes para futuras investigaciones en DL, aplicada en la detección precoz de la AD en entornos clínicos.Escuela de DoctoradoDoctorado en Tecnologías de la Información y las Telecomunicacione
    corecore