389 research outputs found

    An overview of data integration in neuroscience with focus on Alzheimer's Disease

    Get PDF
    : This work represents the first attempt to provide an overview of how to face data integration as the result of a dialogue between neuroscientists and computer scientists. Indeed, data integration is fundamental for studying complex multifactorial diseases, such as the neurodegenerative diseases. This work aims at warning the readers of common pitfalls and critical issues in both medical and data science fields. In this context, we define a road map for data scientists when they first approach the issue of data integration in the biomedical domain, highlighting the challenges that inevitably emerge when dealing with heterogeneous, large-scale and noisy data and proposing possible solutions. Here, we discuss data collection and statistical analysis usually seen as parallel and independent processes, as cross-disciplinary activities. Finally, we provide an exemplary application of data integration to address Alzheimer's Disease (AD), which is the most common multifactorial form of dementia worldwide. We critically discuss the largest and most widely used datasets in AD, and demonstrate how the emergence of machine learning and deep learning methods has had a significant impact on disease's knowledge particularly in the perspective of an early AD diagnosis

    A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status

    Get PDF
    Open Access ArticleBACKGROUND: Diagnostics of the human ageing process may help predict future healthcare needs or guide preventative measures for tackling diseases of older age. We take a transcriptomics approach to build the first reproducible multi-tissue RNA expression signature by gene-chip profiling tissue from sedentary normal subjects who reached 65 years of age in good health. RESULTS: One hundred and fifty probe-sets form an accurate classifier of young versus older muscle tissue and this healthy ageing RNA classifier performed consistently in independent cohorts of human muscle, skin and brain tissue (n = 594, AUC = 0.83-0.96) and thus represents a biomarker for biological age. Using the Uppsala Longitudinal Study of Adult Men birth-cohort (n = 108) we demonstrate that the RNA classifier is insensitive to confounding lifestyle biomarkers, while greater gene score at age 70 years is independently associated with better renal function at age 82 years and longevity. The gene score is 'up-regulated' in healthy human hippocampus with age, and when applied to blood RNA profiles from two large independent age-matched dementia case-control data sets (n = 717) the healthy controls have significantly greater gene scores than those with cognitive impairment. Alone, or when combined with our previously described prototype Alzheimer disease (AD) RNA 'disease signature', the healthy ageing RNA classifier is diagnostic for AD. CONCLUSIONS: We identify a novel and statistically robust multi-tissue RNA signature of human healthy ageing that can act as a diagnostic of future health, using only a peripheral blood sample. This RNA signature has great potential to assist research aimed at finding treatments for and/or management of AD and other ageing-related conditions.European CommissionAlzheimer’s Research UKJohn and Lucille van Geest FoundationNational Institute for Health Research (NIHR)European Medical Information Framework (EMIF)Medical Research Council (MRC)Wallenberg FoundationKarolinska InstitutetSwedish Medical Research CouncilSwedish Society for Medical Research (SSMF

    Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression

    Get PDF
    We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 464 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing chemokine, Jak-stat and insulin signalling pathways, and tight junction interactions. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection, and identify a number of previously validated AD genes including CR1, APOE and TOMM40

    Tissue-specific network-based genome wide study of amygdala imaging phenotypes to identify functional interaction modules

    Get PDF
    Motivation: Network-based genome-wide association studies (GWAS) aim to identify functional modules from biological networks that are enriched by top GWAS findings. Although gene functions are relevant to tissue context, most existing methods analyze tissue-free networks without reflecting phenotypic specificity. Results: We propose a novel module identification framework for imaging genetic studies using the tissue-specific functional interaction network. Our method includes three steps: (i) re-prioritize imaging GWAS findings by applying machine learning methods to incorporate network topological information and enhance the connectivity among top genes; (ii) detect densely connected modules based on interactions among top re-prioritized genes; and (iii) identify phenotype-relevant modules enriched by top GWAS findings. We demonstrate our method on the GWAS of [18F]FDG-PET measures in the amygdala region using the imaging genetic data from the Alzheimer's Disease Neuroimaging Initiative, and map the GWAS results onto the amygdala-specific functional interaction network. The proposed network-based GWAS method can effectively detect densely connected modules enriched by top GWAS findings. Tissue-specific functional network can provide precise context to help explore the collective effects of genes with biologically meaningful interactions specific to the studied phenotype

    Benchmarking network propagation methods for disease gene identification

    Get PDF
    In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genesPeer ReviewedPostprint (published version

    Identifying and ranking potential driver genes of Alzheimer\u27s disease using multiview evidence aggregation.

    Get PDF
    MOTIVATION: Late onset Alzheimer\u27s disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types. RESULTS: We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer\u27s. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer\u27s and are enriched in pathways that have been previously associated with the disease. AVAILABILITY AND IMPLEMENTATION: Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking

    Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials

    Get PDF
    INTRODUCTION: The Alzheimer's Disease Neuroimaging Initiative (ADNI) has continued development and standardization of methodologies for biomarkers and has provided an increased depth and breadth of data available to qualified researchers. This review summarizes the over 400 publications using ADNI data during 2014 and 2015. METHODS: We used standard searches to find publications using ADNI data. RESULTS: (1) Structural and functional changes, including subtle changes to hippocampal shape and texture, atrophy in areas outside of hippocampus, and disruption to functional networks, are detectable in presymptomatic subjects before hippocampal atrophy; (2) In subjects with abnormal β-amyloid deposition (Aβ+), biomarkers become abnormal in the order predicted by the amyloid cascade hypothesis; (3) Cognitive decline is more closely linked to tau than Aβ deposition; (4) Cerebrovascular risk factors may interact with Aβ to increase white-matter (WM) abnormalities which may accelerate Alzheimer's disease (AD) progression in conjunction with tau abnormalities; (5) Different patterns of atrophy are associated with impairment of memory and executive function and may underlie psychiatric symptoms; (6) Structural, functional, and metabolic network connectivities are disrupted as AD progresses. Models of prion-like spreading of Aβ pathology along WM tracts predict known patterns of cortical Aβ deposition and declines in glucose metabolism; (7) New AD risk and protective gene loci have been identified using biologically informed approaches; (8) Cognitively normal and mild cognitive impairment (MCI) subjects are heterogeneous and include groups typified not only by "classic" AD pathology but also by normal biomarkers, accelerated decline, and suspected non-Alzheimer's pathology; (9) Selection of subjects at risk of imminent decline on the basis of one or more pathologies improves the power of clinical trials; (10) Sensitivity of cognitive outcome measures to early changes in cognition has been improved and surrogate outcome measures using longitudinal structural magnetic resonance imaging may further reduce clinical trial cost and duration; (11) Advances in machine learning techniques such as neural networks have improved diagnostic and prognostic accuracy especially in challenges involving MCI subjects; and (12) Network connectivity measures and genetic variants show promise in multimodal classification and some classifiers using single modalities are rivaling multimodal classifiers. DISCUSSION: Taken together, these studies fundamentally deepen our understanding of AD progression and its underlying genetic basis, which in turn informs and improves clinical trial desig

    Improving Cancer Classification Accuracy Using Gene Pairs

    Get PDF
    Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN)

    Procedimiento para mejorar la precisión en el acierto de los fracasos en implantes dentales mediante técnicas de ciencia de datos

    Get PDF
    Nowadays, the prediction about dental implant failure is determined through clinical and radiological evaluation. For this reason, predictions are highly dependent on the Implantologists’ experience. In addition, it is extremely crucial to detect in time if a dental implant is going to fail, due to time, cost, trauma to the patient, postoperative problems, among others. This paper proposes a procedure using multiple feature selection methods and classification algorithms to improve the accuracy of dental implant failures in the province of Misiones, Argentina, validated by human experts. The experimentation is performed with two data sets, a set of dental implants made for the case study and an artificially generated set. The proposed approach allows to know the most relevant features and improve the accuracy in the classification of the target class (dental implant failure), to avoid biasing the decision making based on the application and results of individual methods. The proposed approach achieves an accuracy of 79% of failures, while individual classifiers achieve a maximum of 72%.Hoy en día, la predicción del fracaso de un implante dental está determinado a través de una evaluación clínica y radiológica. Por esta razón, las predicciones dependen en gran medida de la experiencia del implantólogo. Además, es extremadamente crucial detectar a tiempo si un implante dental va a fallar, por cuestiones de tiempo, costo, traumas al paciente, problemas postoperatorios, entre otros. En este trabajo se propone un procedimiento mediante la utilización de múltiples métodos de selección de características y algoritmos de clasificación, para mejorar la precisión en el acierto de los fracasos en implantes dentales de la provincia de Misiones, Argentina validado por expertos humanos. La experimentación es realizada con cuatro conjuntos de datos, un conjunto de implantes dentales confeccionado para el estudio de caso, un conjunto generado artificialmente y otros dos conjuntos obtenidos de distintos repositorios de datos. El procedimiento propuesto permitió conocer las características más relevantes y mejoró la precisión en la clasificación de la clase objetivo (fracaso del implante dental), permitiendo no sesgar la toma de decisión en base a la aplicación y resultados de método individuales. El procedimiento propuesto consigue una precisión del 79% de los fracasos, mientras que los clasificadores individuales alcanzan un máximo del 72%.Fil: Ganz, Nancy Beatriz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Materiales de Misiones. Universidad Nacional de Misiones. Facultad de Ciencias Exactas Químicas y Naturales. Instituto de Materiales de Misiones; ArgentinaFil: Ares, Alicia Esther. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Materiales de Misiones. Universidad Nacional de Misiones. Facultad de Ciencias Exactas Químicas y Naturales. Instituto de Materiales de Misiones; ArgentinaFil: Kuna, Horacio Daniel. Universidad Nacional de Misiones; Argentin

    Network-Based Genome Wide Study of Hippocampal Imaging Phenotype In Alzheimer's Disease To Identify Functional Interaction Modules

    Get PDF
    Identification of functional modules from biological network is a promising approach to enhance the statistical power of genome-wide association study (GWAS) and improve biological interpretation for complex diseases. The precise functions of genes are highly relevant to tissue context, while a majority of module identification studies are based on tissue-free biological networks that lacks phenotypic specificity. In this study, we propose a module identification method that maps the GWAS results of an imaging phenotype onto the corresponding tissue-specific functional interaction network by applying a machine learning framework. Ridge regression and support vector machine (SVM) models are constructed to re-prioritize GWAS results, followed by exploring hippocampus-relevant modules based on top predictions using GWAS top findings. We also propose a GWAS top-neighbor-based module identification approach and compare it with Ridge and SVM based approaches. Modules conserving both tissue specificity and GWAS discoveries are identified, showing the promise of the proposal method for providing insight into the mechanism of complex diseases
    corecore