63 research outputs found

    Reading Between the Genes: Computational Models to Discover Function from Noncoding DNA

    Get PDF
    Noncoding DNA - once called "junk" has revealed itself to be full of function. Technology development has allowed researchers to gather genome-scale data pointing towards complex regulatory regions, expression and function of noncoding RNA genes, and conserved elements. Variation in these regions has been tied to variation in biological function and human disease. This PSB session tackles the problem of handling, analyzing and interpreting the data relating to variation in and interactions between noncoding regions through computational biology. We feature an invited speaker to how variation in transcription factor coding sequences impacts on sequence preference, along with submitted papers that span graph based methods, integrative analyses, machine learning, and dimension reduction to explore questions of basic biology, cancer, diabetes, and clinical relevance.University of Arizona Health Sciences CB2, the BIO5 Institute; NIH [U01AI122275, HL132532, CA023074, 1UG3OD023171, 1R01AG053589-01A1, 1S10RR029030]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Application of next generation sequencing in genetic and genomic studies

    Get PDF
    Genetic variants that spread along the human genome play vital roles in determining our traits, affecting development and potentially causing disorders. Most common disorders have complex underlying mechanisms involving genetic or environmental factors and the interaction between them. Over the past decade, genome-wide association studies (GWAS) have identified thousands of common variants that contribute to complex disorders and partially explain the heritability. However, there is still a large portion that is unexplained and the missing heritability may be caused by several factors, such as rare or low-frequency variants with high effect that are not covered by GWAS and linkage analysis. With the development of next generation sequencing (NGS), it is possible to rapidly detect large amount of novel rare and low-frequency variants simultaneously at a low cost. This new technology provides vast information on studying the association of genetic variations and complex disorders. Once the susceptibility gene is mapped, model organisms such as zebrafish (Danio rerio) are popular for further investigating the possible function of diseaseassociated gene in determining the phenotype. However, the genome annotation of zebrafish is not complete, which affects the characterization of gene functions. Accordingly, highthroughput RNA sequencing can be employed for identifying new transcripts. In our studies, pooled DNA samples were used for whole genome sequencing (WGS) and exome sequencing. In Paper I, we evaluated minor allele frequency (MAF) estimates using three variant detection tools with two sets of pooled exome sequencing and one set of pooled WGS data. The MAFs from the pooled sequencing data demonstrated high concordance (r = 0.88-0.94) with those from the individual genotyping data. In Paper II, exome sequencing implementing pooling strategy was performed on 100 idiopathic scoliosis (IS) patients for mapping susceptibility genes. After validating 20 candidate single nucleotide variants (SNVs), we did not find associations between them and IS. However, the previously reported common variant rs11190870 near LBX1 was validated in a large Scandinavian cohort. In Paper III, we analyzed WGS of pooled DNA samples performed on 19 affected individuals who shared a phenotype-linked haplotype in a dyslexic Finish family. Two of the individuals were sequenced for the whole genome individually as well. The screen for causative variants was narrowed down to a rare SNV, which might affect the binding affinity of LHX2 that regulated dyslexia associated gene ROBO1. In Paper IV, RNA sequencing (RNA-seq) data were analyzed for identifying novel transcripts in zebrafish early development using an inhouse pipeline. We discovered 152 novel transcribed regions (NTRs), validated more than 10 NTRs and quantified their expression in early developmental stages. In our studies, we evaluated and applied a pooling approach for identifying variants susceptible to disease using high-throughput DNA sequencing. Based on RNA sequencing data, we provided new information for genome annotation on model organism zebrafish, which is valuable for studying the function of disease causative genes. In summary, the whole series of studies demonstrate how NGS can be applied in studying the genetic basis of complex disorders and assisting in follow-up functional studies in model organisms

    Exploring Genetic Susceptibility to Autism Spectrum Disorders

    Get PDF
    Abstract.pdfTiivistelmä.pd

    ANNOTATING GENETIC RISK VARIANTS TO TARGET GENES USING Hi-C COUPLED MAGMA (H-MAGMA)

    Get PDF
    An outstanding goal in modern genomics is to systematically predict the functional outcome of non-coding variation associated with complex traits. To bridge the gap between non-coding variation and its functional impact, we developed Hi-C Coupled Multi-Marker Analysis of GenoMic Annotation (H-MAGMA), a framework that converts SNP associations into gene-level associations based on chromatin interaction profiles to assign variants to their target genes. Applying this approach, we identified key biological pathways implicated in a wide range of brain disorders and showed its utility in complementing other functional genomic resources such as expression quantitative trait loci (eQTL)-based variant annotation. We applied H-MAGMA to five psychiatric and four neurodegenerative disorders. We identified that H-MAGMA detects risk genes associated with brain disorders. Additionally, we identified excitatory neurons as the critical cell types underlying psychiatric disorders compared to neurodegenerative disorders. Furthermore, we identified that genes associated with psychiatric disorders are expressed during early brain development, while those associated with neurodegenerative disorders are expressed in later years. Next, we utilized H-MAGMA to pinpoint genes associated with cigarette smoking and alcohol use traits. We next characterized the underlying biological processes and critical cell types underlying substance use traits. We found that pathways including ethanol metabolic process and alcohol catabolic process to be associated with alcohol use traits, while response to nicotinic and acetylcholinergic pathways were identified for cigarette smoking traits. Moreover, we identified dopaminergic, GABAergic, and serotonergic neurons in the midbrain as relevant cell types that may contribute to substance use etiology. Lastly, we provide a detailed protocol for generating the H-MAGMA variant-gene annotation file and provide additional annotation files for 28 tissues and cell types, with the hope of contributing a resource for researchers.Doctor of Philosoph

    The Pharmacoepigenomics Informatics Pipeline and H-GREEN Hi-C Compiler: Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome

    Full text link
    Over the last decade, biomedical science has been transformed by the epigenome and spatial genome, but the discipline of pharmacogenomics, the study of the genetic underpinnings of pharmacological phenotypes like drug response and adverse events, has not. Scientists have begun to use omics atlases of increasing depth, and inferences relating to the bidirectional causal relationship between the spatial epigenome and gene expression, as a foundational underpinning for genetics research. The epigenome and spatial genome are increasingly used to discover causative regulatory variants in the significance regions of genome-wide association studies, for the discovery of the biological mechanisms underlying these phenotypes and the design of genetic tests to predict them. Such variants often have more predictive power than coding variants, but in the area of pharmacogenomics, such advances have been radically underapplied. The majority of pharmacogenomics tests are designed manually on the basis of mechanistic work with coding variants in candidate genes, and where genome wide approaches are used, they are typically not interpreted with the epigenome. This work describes a series of analyses of pharmacogenomics association studies with the tools and datasets of the epigenome and spatial genome, undertaken with the intent of discovering causative regulatory variants to enable new genetic tests. It describes the potent regulatory variants discovered thereby to have a putative causative and predictive role in a number of medically important phenotypes, including analgesia and the treatment of depression, bipolar disorder, and traumatic brain injury with opiates, anxiolytics, antidepressants, lithium, and valproate, and in particular the tendency for such variants to cluster into spatially interacting, conceptually unified pathways which offer mechanistic insight into these phenotypes. It describes the Pharmacoepigenomics Informatics Pipeline (PIP), an integrative multiple omics variant discovery pipeline designed to make this kind of analysis easier and cheaper to perform, more reproducible, and amenable to the addition of advanced features. It described the successes of the PIP in rediscovering manually discovered gene networks for lithium response, as well as discovering a previously unknown genetic basis for warfarin response in anticoagulation therapy. It describes the H-GREEN Hi-C compiler, which was designed to analyze spatial genome data and discover the distant target genes of such regulatory variants, and its success in discovering spatial contacts not detectable by preceding methods and using them to build spatial contact networks that unite disparate TADs with phenotypic relationships. It describes a potential featureset of a future pipeline, using the latest epigenome research and the lessons of the previous pipeline. It describes my thinking about how to use the output of a multiple omics variant pipeline to design genetic tests that also incorporate clinical data. And it concludes by describing a long term vision for a comprehensive pharmacophenomic atlas, to be constructed by applying a variant pipeline and machine learning test design system, such as is described, to thousands of phenotypes in parallel. Scientists struggled to assay genotypes for the better part of a century, and in the last twenty years, succeeded. The struggle to predict phenotypes on the basis of the genotypes we assay remains ongoing. The use of multiple omics variant pipelines and machine learning models with omics atlases, genetic association, and medical records data will be an increasingly significant part of that struggle for the foreseeable future.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145835/1/ariallyn_1.pd

    Genetic determinants of clinical heterogeneity in sickle cell disease

    Get PDF
    L’anémie falciforme est une maladie monogénique causée par une mutation dans le locus de la β-globine. Malgré le fait que l’anémie falciforme soit une maladie monogénique, cette maladie présente une grande hétérogénéité clinique. On présume que des facteurs environnementaux et génétiques contribuent à cette hétérogénéité. Il a été observé qu’un haut taux d’hémoglobine fœtale (HbF) diminuait la sévérité et la mortalité des patients atteints de l’anémie falciforme. Le but de mon projet était d’identifier des variations génétiques modifiant la sévérité clinique de l’anémie falciforme. Dans un premier temps, nous avons effectué la cartographie-fine de trois régions précédemment associées avec le taux d’hémoglobine fœtale. Nous avons ensuite effectué des études d’association pan-génomiques avec deux complications cliniques de l’anémie falciforme ainsi qu’avec le taux d’hémoglobine fœtale. Hormis les régions déjà identifiées comme étant associées au taux d’hémoglobine fœtale, aucun locus n’a atteint le niveau significatif de la puce de génotypage. Pour identifier des groupes de gènes modérément associés au taux d’hémoglobine fœtale qui seraient impliqués dans de mêmes voies biologiques, nous avons effectué une étude des processus biologiques. Finalement, nous avons effectué l’analyse de 19 exomes de patients Jamaïcains ayant des complications cliniques mineures de l’anémie falciforme. Compte tenu de la taille des cohortes de réplication disponibles, nous n’avons pas les moyens de valider statistiquement les variations identifiées par notre étude. Cependant, nos résultats fournissent de bons gènes candidats pour des études fonctionnelles et pour les réplications futures. Nos résultats suggèrent aussi que le β-hydroxybutyrate en concentration endogène pourraient influencer le taux d’hémoglobine fœtale. De plus, nous montrons que la cartographie-fine des régions associées par des études pan-génomiques peut identifier des signaux d’association additionnels et augmenter la variation héritable expliquée par cette région.Sickle cell disease is a monogenic disease caused by a mutation in the β-globin locus. Although it is a monogenic disease, it shows a high clinical heterogeneity. Environmental and genetic factors are thought to play a role in this heterogeneity. It has been observed that a high fetal hemoglobin (HbF) levels correlates with a diminution of the severity and mortality of patients with sickle cell disease. The goal of my project was to identify genetic modifiers of the clinical severity of sickle cell disease. First, I performed the fine-mapping of three regions previously associated with HbF levels. Second, I performed genome-wide association studies with two clinical complications of sickle cell disease as well as with HbF levels. Since no new loci reached array-wide significance for HbF levels, I performed a pathway analysis to identify additional HbF loci of smaller effect size that might implicate shared biological processes. Finally, I performed the analysis of 19 whole exomes from Jamaican sickle cell disease patients with very mild complications. In conclusion, given the sample size of the replication cohorts available, we do not currently have the means to statistically validate the association signals. However, these results provide good candidate genes for functional studies and for future replication. Our results also suggest that β-hydroxybutyrate in endogenous levels could influence HbF levels. Furthermore, we show that fine-mapping the loci associated in genome-wide association studies can identify additional signals and increase the explained heritable variation

    Convergent downstream candidate mechanisms of independent intergenic polymorphisms between co-classified diseases implicate epistasis among noncoding elements

    No full text
    Eighty percent of DNA outside protein coding regions was shown biochemically functional by the ENCODE project, enabling studies of their interactions. Studies have since explored how convergent downstream mechanisms arise from independent genetic risks of one complex disease. However, the cross-talk and epistasis between intergenic risks associated with distinct complex diseases have not been comprehensively characterized. Our recent integrative genomic analysis unveiled downstream biological effectors of disease-specific polymorphisms buried in intergenic regions, and we then validated their genetic synergy and antagonism in distinct GWAS. We extend this approach to characterize convergent downstream candidate mechanisms of distinct intergenic SNPs across distinct diseases within the same clinical classification. We construct a multipartite network consisting of 467 diseases organized in 15 classes, 2,358 disease-associated SNPs, 6,301 SNPassociated mRNAs by eQTL, and mRNA annotations to 4,538 Gene Ontology mechanisms. Functional similarity between two SNPs (similar SNP pairs) is imputed using a nested information theoretic distance model for which p-values are assigned by conservative scale-free permutation of network edges without replacement (node degrees constant). At FDR <= 5%, we prioritized 3,870 intergenic SNP pairs associated, among which 755 are associated with distinct diseases sharing the same disease class, implicating 167 intergenic SNPs, 14 classes, 230 mRNAs, and 134 GO terms. Co-classified SNP pairs were more likely to be prioritized as compared to those of distinct classes confirming a noncoding genetic underpinning to clinical classification (odds ratio similar to 3.8; p <= 10(-25) ). The prioritized pairs were also enriched in regions bound to the same/interacting transcription factors and/or interacting in long-range chromatin interactions suggestive of epistasis (odds ratio similar to 2,500; p <= 10(-25 )). This prioritized network implicates complex epistasis between intergenic polymorphisms of co-classified diseases and offers a roadmap for a novel therapeutic paradigm: repositioning medications that target proteins within downstream mechanisms of intergenic disease-associated SNPs. Supplementary information and software: http://lussiergroup.org/publications/disease classUniversity of Arizona Health Sciences CB2, the BIOS Institute; NIH [U01AI122275, HL132532, CA023074, 1UG3OD023171, 1R01AG053589-01A1, 1S10RR029030]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    RNA, the Epicenter of Genetic Information

    Get PDF
    The origin story and emergence of molecular biology is muddled. The early triumphs in bacterial genetics and the complexity of animal and plant genomes complicate an intricate history. This book documents the many advances, as well as the prejudices and founder fallacies. It highlights the premature relegation of RNA to simply an intermediate between gene and protein, the underestimation of the amount of information required to program the development of multicellular organisms, and the dawning realization that RNA is the cornerstone of cell biology, development, brain function and probably evolution itself. Key personalities, their hubris as well as prescient predictions are richly illustrated with quotes, archival material, photographs, diagrams and references to bring the people, ideas and discoveries to life, from the conceptual cradles of molecular biology to the current revolution in the understanding of genetic information. Key Features Documents the confused early history of DNA, RNA and proteins - a transformative history of molecular biology like no other. Integrates the influences of biochemistry and genetics on the landscape of molecular biology. Chronicles the important discoveries, preconceptions and misconceptions that retarded or misdirected progress. Highlights major pioneers and contributors to molecular biology, with a focus on RNA and noncoding DNA. Summarizes the mounting evidence for the central roles of non-protein-coding RNA in cell and developmental biology. Provides a thought-provoking retrospective and forward-looking perspective for advanced students and professional researchers
    • …
    corecore