108 research outputs found

    Targeting and function of mammalian microRNAs

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Biology, 2009.Cataloged from PDF version of thesis.Includes bibliographical references.In the span of a few short years, animal microRNAs have become recognized as broad regulators of gene expression, largely in part due to our improved understanding of how animal microRNAs recognize their targets. Crucial to microRNA targeting are the ~7-nt seed sites complementary to nucleotides 2-8 at the 5' end of the microRNA. We show that protein-coding genes preferentially expressed at the same time and place as a highly expressed microRNA have evolved their 3' UTR sequence to specifically avoid seed sites matching that microRNA. In contrast, conserved sites appear to be preferentially expressed in developmental states prior to microRNA expression, and are downregulated upon induction of that microRNA. Combined with the result that both conserved and nonconserved seed sites are generally functional, our findings extend the direct and indirect influence of mammalian microRNAs to the majority of protein-coding genes. Although seed sites account for much of the specificity of microRNA regulation, they are not always sufficient for repression, suggesting the contribution of additional specificity determinants. Combining independent computational and experimental approaches, we found five general features associated with site efficacy: AU-rich nucleotide composition near the site, proximity to sites for coexpressed microRNAs, pairing outside of the seed region at microRNA nucleotides 13-16, and positioning within the 3' UTR at least 15nt from the stop codon and away from the center of long UTRs. By incorporating these five features, we are able to explain much of the differences in site efficacy for both exogenously added microRNAs and for endogenous microRNA-message interactions. We further refined the seed site motif involved in microRNA repression, by demonstrating experimentally an Adenosine preference across from the unpaired first nucleotide of the microRNA and ranking the relative effectiveness of different classes of seed sites. Although sites lacking perfect seed pairing were generally ineffective, a fraction of these sites were supplemented by detectable compensatory 3' pairing. In addition, by extending our conservation analysis to 11 genomes, we show that the confidence with which conserved target sites can be predicted is a function of the conservation of the seed site itself relative to the conservation of surrounding sequence. This allows individual conserved sites to be assigned a confidence score reflecting the likelihood that the site is being conserved due to selection rather than by chance.by Kyle Kai-How Farh.Ph. D

    Expanding the MicroRNA Targeting Code: Functional Sites with Centered Pairing

    Get PDF
    Most metazoan microRNA (miRNA) target sites have perfect pairing to the seed region, located near the miRNA 5′ end. Although pairing to the 3′ region sometimes supplements seed matches or compensates for mismatches, pairing to the central region has been known to function only at rare sites that impart Argonaute-catalyzed mRNA cleavage. Here, we present “centered sites,” a class of miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead have 11–12 contiguous Watson-Crick pairs to the center of the miRNA. Although centered sites can impart mRNA cleavage in vitro (in elevated Mg[superscript 2+]), in cells they repress protein output without consequential Argonaute-catalyzed cleavage. Our study also identified extensively paired sites that are cleavage substrates in cultured cells and human brain. This expanded repertoire of cleavage targets and the identification of the centered site type help explain why central regions of many miRNAs are evolutionarily conserved.National Institutes of Health (U.S.)Damon Runyon Cancer Research Foundation. Fellowship Awar

    Rare penetrant mutations confer severe risk of common diseases

    Get PDF
    [INTRODUCTION] Genome-wide association studies (GWASs) have identified thousands of common genetic variants that are predictive of common disease susceptibility, but these variants individually have mild effects on disease owing to the effects of natural selection. By contrast, rare genetic variants can have large effects on common disease risk, but their use in genetic risk prediction has been limited to date owing to the difficulty of distinguishing pathogenic from benign variants and estimating the magnitude of their effects.[RATIONALE] PrimateAI-3D is a three-dimensional convolutional neural network for missense variant–effect prediction, which was trained with common genetic variants from the population sequencing of 233 primate species. By applying this method to estimate the pathogenicity of rare coding variants in 454,712 UK Biobank individuals, we aimed to improve rare-variant association tests and genetic risk prediction for common diseases and complex traits.[RESULTS] We performed rare-variant burden tests for 90 well-powered, clinically relevant phenotypes in the UK Biobank exome dataset. Stratifying missense variants with PrimateAI-3D greatly improved gene discovery, revealing 73% more significant gene-phenotype associations (false discovery rate <0.05) compared with not using PrimateAI-3D. When benchmarked against prior studies, gene-phenotype pairs identified with our method were better supported by orthogonal genetic evidence from GWAS and genes from related Mendelian disorders. In addition, PrimateAI-3D scores showed the strongest correlation among existing variant interpretation algorithms for predicting the quantitative effects of rare variants on continuous clinical phenotypes. Having validated our method for finding gene-phenotype relationships, we next constructed a rare-variant polygenic risk score (PRS) model by combining the rare-variant genes for each phenotype, weighting variants by their PrimateAI-3D prediction score and the direction and effect size of each associated gene. For comparison, we constructed common-variant PRS models and evaluated the performance of the two models for genetic risk prediction in a withheld-test subset of the cohort. Although common variants better explained overall population variance, rare-variant PRSs had more power at the ends of the distribution to identify individuals at the greatest risk for disease, and thus may be more relevant for population genetic screening and risk management. By contrast to common-variant PRS models derived from European populations that show poor generalization to non-Europeans, rare-variant PRSs were substantially more portable to different cohorts and ancestry groups that were not seen during model training. Moreover, because they incorporate orthogonal information from nonoverlapping sets of variants, we combined rare- and common-variant PRS models into a unified model and observed further improvement in genetic risk prediction for common diseases. To understand the extent by which rare-variant PRSs can be expected to improve with increases in discovery cohort size, we repeated our analyses in down-sampled subsets of the UK Biobank cohort. We found that the number of genes contributing to the rare-variant PRS increased linearly, with no signs of plateauing at a half-million exomes. Newly discovered rare-variant genes were strongly enriched at GWAS loci, forming allelic series with effect sizes that were ~10-fold larger on average than the respective common GWAS variant. Among well-powered GWAS loci that could be unambiguously assigned to a single gene, the majority showed subthreshold signal on the rare-variant burden test, indicating that rare penetrant variants exist at a large fraction of GWAS loci and can be incorporated into the rare-variant PRS with further advances in cohort size and variant effect prediction.[CONCLUSION] Understanding the impact of rare variants in common diseases is of prime interest for both precision medicine and the discovery of drug targets. By leveraging advances in variant effect prediction, we have demonstrated major improvements in rare-variant burden testing and genetic risk prediction. Notably, we observed that nearly all individuals carried at least one rare penetrant variant for the phenotypes we examined, demonstrating the utility of personal genome sequencing for otherwise healthy individuals in the general population.T.M.B. is supported by funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 864203), PID2021-126004NB-100 (MICIIN/FEDER, UE) and Secretaria d’Universitats i Recerca, and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2021 SGR 00177).Peer reviewe

    Migraine, inflammatory bowel disease and celiac disease:A Mendelian randomization study

    Get PDF
    Objective: To assess whether migraine may be genetically and/or causally associated with inflammatory bowel disease (IBD) or celiac disease. Background: Migraine has been linked to IBD and celiac disease in observational studies, but whether this link may be explained by a shared genetic basis or could be causal has not been established. The presence of a causal association could be clinically relevant, as treating one of these medical conditions might mitigate the symptoms of a causally linked condition. Methods:Linkage disequilibrium score regression and two-sample bidirectional Mendelian randomization analyses were performed using summary statistics from cohort-based genome-wide association studies of migraine (59,674 cases; 316,078 controls), IBD (25,042 cases; 34,915 controls) and celiac disease (11,812 or 4533 cases; 11,837 or 10,750 controls). Migraine with and without aura were analyzed separately, as were the two IBD subtypes Crohn's disease and ulcerative colitis. Positive control analyses and conventional Mendelian randomization sensitivity analyses were performed.Results: Migraine was not genetically correlated with IBD or celiac disease. No evidence was observed for IBD (odds ratio [OR] 1.00, 95% confidence interval [CI] 0.99–1.02, p = 0.703) or celiac disease (OR 1.00, 95% CI 0.99–1.02, p = 0.912) causing migraine or migraine causing either IBD (OR 1.08, 95% CI 0.96–1.22, p = 0.181) or celiac disease (OR 1.08, 95% CI 0.79–1.48, p = 0.614) when all participants with migraine were analyzed jointly. There was some indication of a causal association between celiac disease and migraine with aura (OR 1.04, 95% CI 1.00–1.08, p = 0.045), between celiac disease and migraine without aura (OR 0.95, 95% CI 0.92–0.99, p = 0.006), as well as between migraine without aura and ulcerative colitis (OR 1.15, 95% CI 1.02–1.29, p = 0.025). However, the results were not significant after multiple testing correction. Conclusions: We found no evidence of a shared genetic basis or of a causal association between migraine and either IBD or celiac disease, although we obtained some indications of causal associations with migraine subtypes.</p

    Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases

    Get PDF
    Regulatory and coding variants are known to be enriched with associations identified by genome-wide association studies (GWASs) of complex disease, but their contributions to trait heritability are currently unknown. We applied variance-component methods to imputed genotype data for 11 common diseases to partition the heritability explained by genotyped SNPs (hg2) across functional categories (while accounting for shared variance due to linkage disequilibrium). Extensive simulations showed that in contrast to current estimates from GWAS summary statistics, the variance-component approach partitions heritability accurately under a wide range of complex-disease architectures. Across the 11 diseases DNaseI hypersensitivity sites (DHSs) from 217 cell types spanned 16% of imputed SNPs (and 24% of genotyped SNPs) but explained an average of 79% (SE = 8%) of hg2 from imputed SNPs (5.1× enrichment; p = 3.7 × 10−17) and 38% (SE = 4%) of hg2 from genotyped SNPs (1.6× enrichment, p = 1.0 × 10−4). Further enrichment was observed at enhancer DHSs and cell-type-specific DHSs. In contrast, coding variants, which span 1% of the genome, explained <10% of hg2 despite having the highest enrichment. We replicated these findings but found no significant contribution from rare coding variants in independent schizophrenia cohorts genotyped on GWAS and exome chips. Our results highlight the value of analyzing components of heritability to unravel the functional architecture of common disease

    Genome-wide coancestry reveals details of ancient and recent male-driven reticulation in baboons

    Get PDF
    [INTRODUCTION] As a widespread but comparatively young clade of six parapatric species, the baboons (Papio sp.) exemplify a frequently observed pattern of mammalian diversity. In particular, they provide analogs for the population structure of the multibranched prehuman lineage that occupied a similar geographic range before the hegemony of “modern” humans, Homo sapiens. Despite phenotypic and genetic differences, interspecies hybridization has been described between baboons at several locations, and population relationships based on mitochondrial DNA (mtDNA) do not correspond with relationships based on phenotype. These previous studies captured the broad outlines of baboon population genetic structure and evolutionary history but necessarily used data that were limited in genomic and geographical coverage and therefore could not adequately document inter- and intrapopulation variation. In this study, we analyzed whole-genome sequences of 225 baboons representing all six species and 19 geographic sites, with 18 local populations represented by multiple individuals.[RATIONALE] Recent studies have identified several mammalian species groups in which genetically distinct lineages have hybridized to generate complex reticulate phylogenies. Baboons provide a valuable context for studying processes generating such population and phylogenetic complexity because extant parapatric species form hybrid zones in several regions of Africa, allowing for direct observation of ongoing introgression. Furthermore, prior studies of nuclear and mtDNA and phenotypic diversity have demonstrated gene flow among differentiated lineages but were unable to develop the detailed picture of process and history that is now possible using whole-genome sequences and modern computational methods. To address these questions, we designed a study that would provide a more fine-grained picture of recent and ancient genetic reticulation by comparing phenotypes and autosomal, X and Y chromosomal, and mtDNA sequences, along with polymorphic insertions of repetitive elements across multiple baboon populations.[RESULTS] Using deep whole-genome sequence data from 225 baboons representing multiple populations, we identified several previously unknown geographic sites of gene flow between genetically distinct populations. We report that yellow baboons (P. cynocephalus) from western Tanzania are the first nonhuman primate found to have received genetic input from three distinct lineages. We compared the ancestry shared among individuals, estimated separately from the X chromosome and autosomes, to distinguish shared ancestry due to ancestral population relationships from coancestry as a result of recent male-biased immigration and gene flow. This reveals directionality and sex bias of recent gene flow in several locations. Analyses of population differences within species quantified different degrees of interspecies introgression among populations with an essentially identical phenotype.[CONCLUSION] The population genetic structure and history of introgression among baboon lineages are even more complex than predicted from observed phenotypic diversity and prior studies of limited genetic data. Single populations can carry genetic contributions from more than two ancestral sources. Populations that appear homogeneous on the basis of observable phenotype can display different levels of interspecies introgression. The evolutionary dynamics and current structure of baboon population diversity indicate that other mammals displaying differentiated and geographically separate species may also have more-complex histories than anticipated. This may also be true for the morphologically defined hominin taxa from the past 4 million years.This work was funded by “la Caixa” Foundation (ID 100010434), fellowship code LCF/BQ/PR19/11700002 (M.K.); the Vienna Science and Technology Fund (WWTF) (10.47379/VRG20001) (M.K.); German Research Foundation grants FI707/9-1, KN1097/3-1/3-1, KN1097/4-1, ZI548/5-1, and RO3055/2-1 (J.F., S.K., D.Z., and C.R.); Novo Nordisk Foundation grant 0058553 (E.F.S. and K.M.); R01 GM59290 (M.A.B.); and internal funding from Baylor College of Medicine (J.R.). T.M.B. is supported by funding from the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant 864203), PID2021-126004NB-100 (MICIIN/FEDER, UE) and Secretaria d'Universitats i Recerca and CERCA Programme del Departament d'Economia i Coneixement de la Generalitat de Catalunya (GRC 2021 SGR 00177).Peer reviewe

    Identification of constrained sequence elements across 239 primate genomes

    Get PDF
    Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3–9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals

    The landscape of tolerated genetic variation in humans and primates

    Get PDF

    Phylogenomic analyses provide insights into primate evolution

    Get PDF
    Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, the New World monkeys and the Strepsirrhini. Our analyses reveal heterogeneous rates of genomic rearrangement and gene evolution across primate lineages. Thousands of genes under positive selection in different lineages play roles in the nervous, skeletal, and digestive systems and may have contributed to primate innovations and adaptations. Our study reveals that many key genomic innovations occurred in the Simiiformes ancestral node and may have had an impact on the adaptive radiation of the Simiiformes and human evolution
    corecore