71 research outputs found

    Transfer Learning for Classification of Alzheimer's Disease Based on Genome Wide Data

    Get PDF
    Alzheimer's disease (AD) is a type of brain disorder that is regarded as a degenerative disease because the corresponding symptoms aggravate with the time progression. Single nucleotide polymorphisms (SNPs) have been identified as relevant biomarkers for this condition. This study aims to identify SNPs biomarkers associated with the AD in order to perform a reliable classification of AD. In contrast to existing related works, we utilize deep transfer learning with varying experimental analysis for reliable classification of AD. For this purpose, the convolutional neural networks (CNN) are firstly trained over the genome-wide association studies (GWAS) dataset requested from the AD neuroimaging initiative. We then employ the deep transfer learning for further training of our CNN (as base model) over a different AD GWAS dataset, to extract the final set of features. The extracted features are then fed into Support Vector Machine for classification of AD. Detailed experiments are performed using multiple datasets and varying experimental configurations. The statistical outcomes indicate an accuracy of 89% which is a significant improvement when benchmarked with existing related works

    EXplainable Artificial Intelligence: enabling AI in neurosciences and beyond

    Get PDF
    The adoption of AI models in medicine and neurosciences has the potential to play a significant role not only in bringing scientific advancements but also in clinical decision-making. However, concerns mounts due to the eventual biases AI could have which could result in far-reaching consequences particularly in a critical field like biomedicine. It is challenging to achieve usable intelligence because not only it is fundamental to learn from prior data, extract knowledge and guarantee generalization capabilities, but also to disentangle the underlying explanatory factors in order to deeply understand the variables leading to the final decisions. There hence has been a call for approaches to open the AI `black box' to increase trust and reliability on the decision-making capabilities of AI algorithms. Such approaches are commonly referred to as XAI and are starting to be applied in medical fields even if not yet fully exploited. With this thesis we aim at contributing to enabling the use of AI in medicine and neurosciences by taking two fundamental steps: (i) practically pervade AI models with XAI (ii) Strongly validate XAI models. The first step was achieved on one hand by focusing on XAI taxonomy and proposing some guidelines specific for the AI and XAI applications in the neuroscience domain. On the other hand, we faced concrete issues proposing XAI solutions to decode the brain modulations in neurodegeneration relying on the morphological, microstructural and functional changes occurring at different disease stages as well as their connections with the genotype substrate. The second step was as well achieved by firstly defining four attributes related to XAI validation, namely stability, consistency, understandability and plausibility. Each attribute refers to a different aspect of XAI ranging from the assessment of explanations stability across different XAI methods, or highly collinear inputs, to the alignment of the obtained explanations with the state-of-the-art literature. We then proposed different validation techniques aiming at practically fulfilling such requirements. With this thesis, we contributed to the advancement of the research into XAI aiming at increasing awareness and critical use of AI methods opening the way to real-life applications enabling the development of personalized medicine and treatment by taking a data-driven and objective approach to healthcare

    Computational Methods for the Analysis of Genomic Data and Biological Processes

    Get PDF
    In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    2014 Update of the Alzheimer's Disease Neuroimaging Initiative: A review of papers published since its inception

    Get PDF
    The Alzheimer's Disease Neuroimaging Initiative (ADNI) is an ongoing, longitudinal, multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer's disease (AD). The initial study, ADNI-1, enrolled 400 subjects with early mild cognitive impairment (MCI), 200 with early AD, and 200 cognitively normal elderly controls. ADNI-1 was extended by a 2-year Grand Opportunities grant in 2009 and by a competitive renewal, ADNI-2, which enrolled an additional 550 participants and will run until 2015. This article reviews all papers published since the inception of the initiative and summarizes the results to the end of 2013. The major accomplishments of ADNI have been as follows: (1) the development of standardized methods for clinical tests, magnetic resonance imaging (MRI), positron emission tomography (PET), and cerebrospinal fluid (CSF) biomarkers in a multicenter setting; (2) elucidation of the patterns and rates of change of imaging and CSF biomarker measurements in control subjects, MCI patients, and AD patients. CSF biomarkers are largely consistent with disease trajectories predicted by β-amyloid cascade (Hardy, J Alzheimer's Dis 2006;9(Suppl 3):151-3) and tau-mediated neurodegeneration hypotheses for AD, whereas brain atrophy and hypometabolism levels show predicted patterns but exhibit differing rates of change depending on region and disease severity; (3) the assessment of alternative methods of diagnostic categorization. Currently, the best classifiers select and combine optimum features from multiple modalities, including MRI, [(18)F]-fluorodeoxyglucose-PET, amyloid PET, CSF biomarkers, and clinical tests; (4) the development of blood biomarkers for AD as potentially noninvasive and low-cost alternatives to CSF biomarkers for AD diagnosis and the assessment of α-syn as an additional biomarker; (5) the development of methods for the early detection of AD. CSF biomarkers, β-amyloid 42 and tau, as well as amyloid PET may reflect the earliest steps in AD pathology in mildly symptomatic or even nonsymptomatic subjects and are leading candidates for the detection of AD in its preclinical stages; (6) the improvement of clinical trial efficiency through the identification of subjects most likely to undergo imminent future clinical decline and the use of more sensitive outcome measures to reduce sample sizes. Multimodal methods incorporating APOE status and longitudinal MRI proved most highly predictive of future decline. Refinements of clinical tests used as outcome measures such as clinical dementia rating-sum of boxes further reduced sample sizes; (7) the pioneering of genome-wide association studies that leverage quantitative imaging and biomarker phenotypes, including longitudinal data, to confirm recently identified loci, CR1, CLU, and PICALM and to identify novel AD risk loci; (8) worldwide impact through the establishment of ADNI-like programs in Japan, Australia, Argentina, Taiwan, China, Korea, Europe, and Italy; (9) understanding the biology and pathobiology of normal aging, MCI, and AD through integration of ADNI biomarker and clinical data to stimulate research that will resolve controversies about competing hypotheses on the etiopathogenesis of AD, thereby advancing efforts to find disease-modifying drugs for AD; and (10) the establishment of infrastructure to allow sharing of all raw and processed data without embargo to interested scientific investigators throughout the world

    Design and application of SuRFR: an R package to prioritise candidate functional DNA sequence variants

    Get PDF
    Genetic analyses such as linkage and genome wide association studies (GWAS) have been extremely successful at identifying genomic regions that harbour genetic variants contributing to complex disorders. Over 90% of disease-associated variants from GWAS fall within non-coding regions (Maurano et al., 2012). However, pinpointing the causal variants has proven a major bottleneck to genetic research. To address this I have developed SuRFR, an R package for the ranked prioritisation of candidate causal variants by predicted function. SuRFR produces rank orderings of variants based upon functional genomic annotations, including DNase hypersensitivity signal, chromatin state, minor allele frequency, and conservation. The ranks for each annotation are combined into a final prioritisation rank using a weighting system that has been parametrised and tested through ten-fold cross-validation. SuRFR has been tested extensively upon a combination of synthetic and real datasets and has been shown to perform with high sensitivity and specificity. These analyses have provided insight into the extent to which different classes of functional annotation are most useful for the identification of known regulatory variants: the most important factor for identifying a true variant across all classes of regulatory variants is position relative to genes. I have also shown that SuRFR performs at least as well as its nearest competitors whilst benefiting from the advantages that come from being part of the R environment. I have applied SuRFR to several genomics projects, particularly the study of psychiatric illness, including genome sequencing of a large Scottish family with bipolar disorder. This has resulted in the prioritisation of such variants for future study

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Genetic predictors for epilepsy development, treatment response and dosing

    Get PDF
    Antiepileptic drug (AED) treatment is the first line strategy for seizure control in the majority of individuals with epilepsy but remains challenging, not least because of interindividual variability in efficacy, tolerability and dosing. The studies presented in this thesis set out to explore that variability from a genomic perspective in patients with newly diagnosed epilepsy from across the UK. Single nucleotide polymorphisms (SNPs) in genes encoding drug metabolising enzymes (DMEs) may be associated with the dose of carbamazepine (CBZ) required for seizure control. A cohort of 159 individuals who were seizure-free for 12 months on a stable dose of CBZ monotherapy was genotyped for 51 SNPs across six DMEs. Haplotype analysis identified 8 haplotype blocks across the genes. No single SNPs or haplotype blocks were associated with CBZ dose. Thus, it is unlikely that genetic variability in DMEs accounts for the individual differences in CBZ dose requirement. A splice site SNP (rs3812718) in the SCN1A gene was previously shown to influence maximum doses of AEDs. This SNP was genotyped in 817 patients and tested for association with maximum and maintenance doses of several AEDs. An association was identified between rs3812718 and maximum AED dose, with an interaction analysis suggestive of a drug specific effect. These findings suggest that this SCN1A variant contributes to variability in the limit of tolerability to AEDs. Response to AED treatment is multifactorial and likely to be influenced by multiple genes. Five SNPs previously reported to predict treatment outcome in epilepsy were genotyped in 772 patients and the resulting data, together with data from an Australian cohort, incorporated into a predictive algorithm. The algorithm failed to predict treatment outcome in general but was partially successful in identifying responders to CBZ and valproate. These five SNPs may be relevant to the prognosis of epilepsy, particularly when treated with specific AEDs. Primary generalised epilepsies (PGEs) are highly heritable and believed to be polygenic in origin. Predictive algorithms were employed to explore genetic influences on seizure (absence vs. myoclonus) and epilepsy (PGE vs. focal) type using 1,840 SNP genotypes available from 436 patients with PGE. Although the algorithms failed to distinguish PGE patients on the basis of genetic variants, they showed improved association over univariate methods of analysis. Such an approach may be suitable for future investigations using large genomic datasets. A recent genome-wide association study identified multiple genetic variants that approached genome-wide significance for association with 12 month remission from seizures. Five of these SNPs were genotyped in an independent cohort of 424 patients and tested for association with remission and time to remission. No significant associations were found, questioning the validity of the original observation or the method of replication. Further work is required to understand this outcome. In conclusion, the genetic bases of epilepsy, AED response and AED dose requirement are multigenic and thus far undetectable using traditional association studies in modestly-sized patient cohorts. Further advances in genomic, bioinformatics and statistical methodologies are required before the genetic contribution to heterogeneity in epilepsy-related phenotypes can be translated into improved clinical care

    A Bioinformatics Study of Protein Conformational Flexibility and Misfolding: a Sequence, Structure and Dynamics Approach

    Get PDF
    This PhD Thesis titled "A Bioinformatics Study of Protein Conformational Flexibility and Misfolding: a Sequence, Structure and Dynamics Approach" comprises the results and conclusions obtained by us from the study of three different but somehow related research projects, covering aspects of the phenomenon of protein local conformational instability, its relationship with protein function, evolvability and aggregation, and the effect of genetic variations on protein conformational instability related to Conformational Diseases. These projects include the prediction of putative prion proteins in complete proteomes and the study of prion biology from a genomic perspective, the prediction of conformationally unstable protein regions and the existence of a structural framework for linking conformational instability to folding and function, and the establishment of a rationale for assessing the connection among mutations and disease phenotypes in Conformational Diseases.Esta tesis doctoral comprende los resultados y conclusiones obtenidos por nosotros a partir del estudio de tres proyectos de investigación diferentes pero de alguna manera relacionados, cubriendo los aspectos del fenómeno de la inestabilidad conformacional local de la proteína, su relación con la función de la proteína, la capacidad de evolución y agregación, y el efecto de las variaciones genéticas en la inestabilidad conformacional de la proteína relacionados con las enfermedades conformacionales. Estos proyectos incluyen la predicción de presuntas proteínas priónicas en proteomas complejos y el estudio de la biología de priones desde una perspectiva genómica, la predicción de las regiones de proteínas conformacionalmente inestables y la existencia de un marco estructural para la vinculación de la inestabilidad conformacional del plegado y la función, y el establecimiento de una razón fundamental para la evaluación de la relación entre las mutaciones y fenotipos de la enfermedad en enfermedades conformacionales
    corecore