546 research outputs found

    Functional Analysis of Human Long Non-coding RNAs and Their Associations with Diseases

    Get PDF
    Within this study, we sought to leverage knowledge from well-characterized protein coding genes to characterize the lesser known long non-coding RNA (lncRNA) genes using computational methods to find functional annotations and disease associations. Functional genome annotation is an essential step to a systems-level view of the human genome. With this knowledge, we can gain a deeper understanding of how humans develop and function, and a better understanding of human disease. LncRNAs are transcripts greater than 200 nucleotides, which do not code for proteins. LncRNAs have been found to regulate development, tissue and cell differentiation, and organ formation. Their dysregulation has been linked to several diseases including autism spectrum disorder (ASD) and cancer. While a great deal of research has been dedicated to protein-coding genes, the relatively recently discovered lncRNA genes have yet to be characterized. LncRNA function is tied closely to when and where they are expressed. Co-expression network analysis offer a means of functional annotation of uncharacterized genes through a guilt by association approach. We have constructed two co-expression networks using known disease-associated protein-coding genes and lncRNA genes. Through clustering of the networks, gene set enrichment analysis, and centrality measures, we found enrichment for disease association and functions as well as identified high-confidence lncRNA disease gene targets. We present a novel approach to the identification of disease state associations by demonstrating genes that are associated with the same disease states share patterns that can be discerned from transcriptomes of healthy tissues. Using a machine learning algorithm, we built a model to classify ASD versus non-ASD genes using their expression profiles from healthy developing human brain tissues. Feature selection during the model-building process also identified critical temporospatial points for the determination of ASD genes. We constructed a webserver tool for the prioritization of genes for ASD association. The webserver tool has a database containing prioritization and co-expression information for nearly every gene in the human genome

    PANDA: prioritization of autism-genes using network-based deep-learning approach

    Get PDF
    Autism is a neuropsychiatric disorder characterized by impairments in reciprocal social interaction and communication, and the presence of restricted and repetitive behaviours. Autism is predominantly heritable, but the underlying genetic associations are still largely unknown. Understanding the genetic background of complex diseases, such as autism, plays an essential role in the promising precision medicine. The evaluation of candidate genes, however, requires time-consuming and expensive experiments given the large number of possibilities. Thus, computational methods have seen increasing applications in predicting gene-disease associations. In this thesis, we proposed a bioinformatics framework, Prioritization of Autism-genes using Network-based Deep-learning Approach (PANDA). Our approach aims to identify autism-genes across the human genome based on patterns of gene-gene interactions and topological similarity of genes in the interaction network. PANDA trains a graph deep learning classifier using the input of the human molecular interaction network (HMIN) and predicts and ranks the probability of autism association of every node (gene) in the network. PANDA was able to achieve a high classification accuracy of 89%, outperforming three other commonly used machine learning algorithms. Moreover, the gene prioritization ranking list produced by PANDA was evaluated and validated using a large-scale independent exome-sequencing study. The top decile (top 10%) of PANDA ranked genes were found significantly enriched for autism association

    Development of New Bioinformatic Approaches for Human Genetic Studies

    Get PDF
    The development of bioinformatics methods for human genetic studies utilizes the vast amount of data to generate new valuable information. Machine learning and statistical coupling analysis can be used in the study of human diseases. These diseases include intellectual disabilities (ID), prevalent in 1-3% of the population and caused primarily by genetics. Although many cases of ID are caused by mutations in protein-coding genes, the possible involvement of long non-coding RNAs (lncRNAs) in ID due to their role in gene expression regulation, has been explored. In this study, we used machine learning to develop a new expression-based model trained using ID genes encoded with the developing brain transcriptome. The model was fine-tuned using the class-balancing approach of synthetic over-sampling of the minority class, resulting in improved performance. We used the model to predict candidate ID-associated lncRNAs. Our model identified several candidates that overlapped with previously reported ID-associated lncRNAs, enriched with neurodevelopmental functions, and highly expressed in brain tissues. Machine learning was also used to predict protein stability changes caused by missense mutations, which can lead to disease conditions including ID. We tested Random Forests, Support Vector Machines (SVM) and Naïve Bayes to find the best-performing algorithm to develop a multi-class classifier. We developed an SVM model using relevant physico-chemical features after feature selection. Our work identified new features for predicting the effect of amino acid substitutions on protein stability and a well-performing multi-class classifier solely based on sequence information. Statistical approaches were used to analyze the association between mutations and phenotypes. In this study, we used statistical coupling analysis (SCA) to cluster disease-causing mutations and ID phenotypes. Using SCA we identified groups of co-evolving residues, known as protein sectors, in ID protein families. Within each distinct sector, mutations associated with different phenotypic manifestations associated with a syndromic ID were identified. Our results suggest that protein sector analysis can be used to associate mutations with phenotypic manifestations in human diseases. The bioinformatic methods developed in this dissertation can be used in human genetic research to understand the role of new genes and proteins in human disease

    System-Level Analysis of Alzheimer\u27s Disease Prioritizes Candidate Genes for Neurodegeneration.

    Get PDF
    Alzheimer\u27s disease (AD) is a debilitating neurodegenerative disorder. Since the advent of the genome-wide association study (GWAS) we have come to understand much about the genes involved in AD heritability and pathophysiology. Large case-control meta-GWAS studies have increased our ability to prioritize weaker effect alleles, while the recent development of network-based functional prediction has provided a mechanism by which we can use machine learning to reprioritize GWAS hits in the functional context of relevant brain tissues like the hippocampus and amygdala. In parallel with these developments, groups like the Alzheimer\u27s Disease Neuroimaging Initiative (ADNI) have compiled rich compendia of AD patient data including genotype and biomarker information, including derived volume measures for relevant structures like the hippocampus and the amygdala. In this study we wanted to identify genes involved in AD-related atrophy of these two structures, which are often critically impaired over the course of the disease. To do this we developed a combined score prioritization method which uses the cumulative distribution function of a gene\u27s functional and positional score, to prioritize top genes that not only segregate with disease status, but also with hippocampal and amygdalar atrophy. Our method identified a mix of genes that had previously been identified in AD GWAS including APOE, TOMM40, and NECTIN2(PVRL2) and several others that have not been identified in AD genetic studies, but play integral roles in AD-effected functional pathways including IQSEC1, PFN1, and PAK2. Our findings support the viability of our novel combined score as a method for prioritizing region- and even cell-specific AD risk genes

    Mining high-level brain imaging genetic associations

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Imaging genetics is an emerging research field in neurodegenerative diseases. It studies the influence of genetic variants on brain structure and function. Genome-wide association studies (GWAS) of brain imaging has identified a few independent risk loci for individual imaging quantitative trait (iQT), which however display only modest effect size and explain limited heritability. This thesis focuses on mining high-level imaging genetic associations and their applications on neurodegenerative diseases. This thesis first presents a novel network-based GWAS framework for identifying functional modules, by employing a two-step strategy in a top-down manner. It first integrates tissue-specific network with GWAS of corresponding phenotype in regression models in addition to classification, to re-prioritize genome-wide associations. Then it detects densely connected and disease-relevant modules based on interactions among top reprioritizations. The discovered modules hold both phenotypical specificity and densely interaction. We applied it to an amygdala imaging genetics analysis in the study of Alzheimer's disease (AD). The proposed framework effectively detects densely interacted modules; and the reprioritizations achieve highest concordance with AD genes. We then present an extension of the above framework, named GWAS top-neighbor-based (tnGWAS); and compare it with previous approaches. This tnGWAS extracts densely connected modules from top GWAS findings, based on the hypothesis that relevant modules consist of top GWAS findings and their close neighbors. It is applied to a hippocampus imaging genetics analysis in AD research, and yields the densest interactions among top candidate genes. Experimental results demonstrate that precise context does help explore collective effects of genes with functional interactions specific to the studied phenotype. In the second part, a novel imaging genetic enrichment analysis (IGEA) paradigm is proposed for discovering complex associations among genetic modules and brain circuits. In addition to genetic modules, brain regions of interest also grouped to play role. We expand the scope of one-dimensional enrichment analysis into imaging genetics. This framework jointly considers meaningful gene sets (GS) and brain circuits (BC), and examines whether given GS-BC module is enriched in gene-iQT findings. We conduct the proof-of-concept study and demonstrate its performance by applying to a brain-wide imaging genetics study of AD

    Systems Approaches For Gene and Drug Discovery in Alzheimer’s Disease

    Get PDF
    Alzheimer’s Disease (AD) is a devastating neurodegenerative disorder affecting all tissues and cell types of brain leading to emotional dysregulation and cognitive dysfunction. From genome-wide association studies (GWAS), to date we have identified forty-two genome-wide significant genes for AD that influence overall disease risk or endophenotypes, including neuroimaging and gene expression profiles. Nevertheless, the currently known AD genes do not account for a significant proportion of the heritability of disease risk, implying the existence of many weak-effect variants in potentially thousands of genes as drivers of AD outcomes. This genetic architecture, composed of many small effects, is partly due to the complexity of molecular interaction networks, where the influence of individual genetic variants is attenuated by overlapping molecular pathways that are tuned by evolution for robustness. Thus, overcoming the limitations of GWAS for dissecting the mechanisms of AD requires methods to identify disease pathways that are enriched for weak genetic effects. Network-based functional prediction (NBFP) methods use machine learning in gene interaction networks to robustly learn pathways containing risk genes, which augments the raw statistical signal from GWAS with biological prior knowledge encoded in a tissue-specific gene network. NBFP methods have several benefits, including their robustness to statistical noise over raw GWAS statistics and they enable nomination of functionally relevant candidate genes that do not themselves carry risk polymorphisms. However, most NBFP methods are currently limited to single tissues, which is not optimal for complex disorders like AD that involve many functionally distinct cell types and brain regions. Moreover, there are now multiple studies of AD endophenotypes, including brain-region specific gene expression and whole-brain neuroimaging, both paired with genotypes, and single-cell gene expression data. Thus, there is an additional need for integrative tools that combine disparate data sources to nominate candidate genes for distinct pathophysiological processes. In this work, we developed new methods to rank candidate genes based on multiple disease-relevant networks and to combine gene rankings arising from multiple sources, including NBFP, imaging GWAS, and gene expression. In our first study, we applied NBFP to systematically rank AD-risk genes in the hippocampus and amygdala and developed a novel combined scoring method to integrate these scores with GWAS associations for low hippocampal and amygdalar volume in patients with AD. Our method nominated a novel set of region-specific candidate genes primarily involved in maintaining the stability of the synapse and regulating excitotoxicity. In our second study, we developed a multi-network-based functional prediction (mNBFP) to allow multiple source networks. Using three brain-cell-specific networks, our mNBFP approach outperformed single-network approaches in training performance and achieved high concordance with recently published AD-GWAS associations. In our third and final study we integrated our multi-network NBFP and combined scoring approaches with single-cell gene expression and the Library of Integrated Network-based Cellular Signatures (LINCS) database to identify potential drug repositioning candidates for AD. We identified the protein AP1B1 as having strong potential to target an early-AD gene expression signature, which may yield a novel mechanism for early therapeutic intervention

    Temporal proteomic profiling of postnatal human cortical development.

    Get PDF
    Healthy cortical development depends on precise regulation of transcription and translation. However, the dynamics of how proteins are expressed, function and interact across postnatal human cortical development remain poorly understood. We surveyed the proteomic landscape of 69 dorsolateral prefrontal cortex samples across seven stages of postnatal life and integrated these data with paired transcriptome data. We detected 911 proteins by liquid chromatography-mass spectrometry, and 83 were significantly associated with postnatal age (FDR < 5%). Network analysis identified three modules of co-regulated proteins correlated with age, including two modules with increasing expression involved in gliogenesis and NADH metabolism and one neurogenesis-related module with decreasing expression throughout development. Integration with paired transcriptome data revealed that these age-related protein modules overlapped with RNA modules and displayed collinear developmental trajectories. Importantly, RNA expression profiles that are dynamically regulated throughout cortical development display tighter correlations with their respective translated protein expression compared to those RNA profiles that are not. Moreover, the correspondence between RNA and protein expression significantly decreases as a function of cortical aging, especially for genes involved in myelination and cytoskeleton organization. Finally, we used this data resource to elucidate the functional impact of genetic risk loci for intellectual disability, converging on gliogenesis, myelination and ATP-metabolism modules in the proteome and transcriptome. We share all data in an interactive, searchable companion website. Collectively, our findings reveal dynamic aspects of protein regulation and provide new insights into brain development, maturation, and disease

    Role of network topology based methods in discovering novel gene-phenotype associations

    Get PDF
    The cell is governed by the complex interactions among various types of biomolecules. Coupled with environmental factors, variations in DNA can cause alterations in normal gene function and lead to a disease condition. Often, such disease phenotypes involve coordinated dysregulation of multiple genes that implicate inter-connected pathways. Towards a better understanding and characterization of mechanisms underlying human diseases, here, I present GUILD, a network-based disease-gene prioritization framework. GUILD associates genes with diseases using the global topology of the protein-protein interaction network and an initial set of genes known to be implicated in the disease. Furthermore, I investigate the mechanistic relationships between disease-genes and explain the robustness emerging from these relationships. I also introduce GUILDify, an online and user-friendly tool which prioritizes genes for their association to any user-provided phenotype. Finally, I describe current state-of-the-art systems-biology approaches where network modeling has helped extending our view on diseases such as cancer.La cèl•lula es regeix per interaccions complexes entre diferents tipus de biomolècules. Juntament amb factors ambientals, variacions en el DNA poden causar alteracions en la funció normal dels gens i provocar malalties. Sovint, aquests fenotips de malaltia involucren una desregulació coordinada de múltiples gens implicats en vies interconnectades. Per tal de comprendre i caracteritzar millor els mecanismes subjacents en malalties humanes, en aquesta tesis presento el programa GUILD, una plataforma que prioritza gens relacionats amb una malaltia en concret fent us de la topologia de xarxe. A partir d’un conjunt conegut de gens implicats en una malaltia, GUILD associa altres gens amb la malaltia mitjancant la topologia global de la xarxa d’interaccions de proteïnes. A més a més, analitzo les relacions mecanístiques entre gens associats a malalties i explico la robustesa es desprèn d’aquesta anàlisi. També presento GUILDify, un servidor web de fácil ús per la priorització de gens i la seva associació a un determinat fenotip. Finalment, descric els mètodes més recents en què el model•latge de xarxes ha ajudat extendre el coneixement sobre malalties complexes, com per exemple a càncer
    • …
    corecore