12 research outputs found

    Extreme purifying selection against point mutations in the human genome

    Get PDF
    Large-scale genome sequencing has enabled the measurement of strong purifying selection in protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring such selection in noncoding as well as coding regions of the human genome. ExtRaINSIGHT estimates the prevalence of "ultraselection" by the fractional depletion of rare single-nucleotide variants, after controlling for variation in mutation rates. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find abundant ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. By contrast, we find much less ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest levels in ultraconserved elements. We estimate that ~0.4-0.7% of the human genome is ultraselected, implying ~ 0.26-0.51 strongly deleterious mutations per generation. Overall, our study sheds new light on the genome-wide distribution of fitness effects by combining deep sequencing data and classical theory from population genetics

    Deconvolution of Expression for Nascent RNA sequencing data (DENR) highlights pre-RNA isoform diversity in human cells.

    Get PDF
    MOTIVATION: Quantification of isoform abundance has been extensively studied at the mature-RNA level using RNA-seq but not at the level of precursor RNAs using nascent RNA sequencing. RESULTS: We address this problem with a new computational method called Deconvolution of Expression for Nascent RNA sequencing data (DENR), which models nascent RNA sequencing read counts as a mixture of user-provided isoforms. The baseline algorithm is enhanced by machine-learning predictions of active transcription start sites and an adjustment for the typical "shape profile" of read counts along a transcription unit. We show that DENR outperforms simple read-count-based methods for estimating gene and isoform abundances, and that transcription of multiple pre-RNA isoforms per gene is widespread, with frequent differences between cell types. In addition, we provide evidence that a majority of human isoform diversity derives from primary transcription rather than from post-transcriptional processes. AVAILABILITY: DENR and nascentRNASim are freely available at https://github.com/CshlSiepelLab/DENR (version v1.0.0) and https://github.com/CshlSiepelLab/nascentRNASim (version v0.3.0). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Nascent RNA sequencing reveals a dynamic global transcriptional response at genes and enhancers to the natural medicinal compound celastrol

    Get PDF
    Most studies of responses to transcriptional stimuli measure changes in cellular mRNA concentrations. By sequencing nascent RNA instead, it is possible to detect changes in transcription in minutes rather than hours, and thereby distinguish primary from secondary responses to regulatory signals. Here, we describe the use of PRO-seq to characterize the immediate transcriptional response in human cells to celastrol, a compound derived from traditional Chinese medicine that has potent anti-inflammatory, tumor-inhibitory and obesity-controlling effects. Our analysis of PRO-seq data for K562 cells reveals dramatic transcriptional effects soon after celastrol treatment at a broad collection of both coding and noncoding transcription units. This transcriptional response occurred in two major waves, one within 10 minutes, and a second 40-60 minutes after treatment. Transcriptional activity was generally repressed by celastrol, but one distinct group of genes, enriched for roles in the heat shock response, displayed strong activation. Using a regression approach, we identified key transcription factors that appear to drive these transcriptional responses, including members of the E2F and RFX families. We also found sequence-based evidence that particular TFs drive the activation of enhancers. We observed increased polymerase pausing at both genes and enhancers, suggesting that pause release may be widely inhibited during the celastrol response. Our study demonstrates that a careful analysis of PRO-seq time course data can disentangle key aspects of a complex transcriptional response, and it provides new insights into the activity of a powerful pharmacological agent

    Characterizing RNA stability genome-wide through combined analysis of PRO-seq and RNA-seq data

    Get PDF
    The rate at which RNA molecules decay is a key determinant of cellular RNA concentrations, yet current approaches for measuring RNA half-lives are generally labor-intensive, limited in sensitivity, and/or disruptive to normal cellular processes. Here we introduce a simple method for estimating relative RNA half-lives that is based on two standard and widely available high-throughput assays: Precision Run-On and sequencing (PRO-seq) and RNA sequencing (RNA-seq). Our method treats PRO-seq as a measure of transcription rate and RNA-seq as a measure of RNA concentration, and estimates the rate of RNA decay required for a steady-state equilibrium. We show that this approach can be used to assay relative RNA half-lives genome-wide, with good accuracy and sensitivity for both coding and noncoding transcription units. Using a structural equation model (SEM), we test several features of transcription units, nearby DNA sequences, and nearby epigenomic marks for associations with RNA stability after controlling for their effects on transcription. We find that RNA splicing-related features are positively correlated with RNA stability, whereas features related to miRNA binding, DNA methylation, and G+C-richness are negatively correlated with RNA stability. Furthermore, we find that a measure based on U1-binding and polyadenylation sites distinguishes between unstable noncoding and stable coding transcripts but is not predictive of relative stability within the mRNA or lincRNA classes. We also identify several histone modifications that are associated with RNA stability. Together, our estimation method and systematic analysis shed light on the pervasive impacts of RNA stability on cellular RNA concentrations

    Characterizing RNA stability genome-wide through combined analysis of PRO-seq and RNA-seq data.

    Get PDF
    BACKGROUND: The concentrations of distinct types of RNA in cells result from a dynamic equilibrium between RNA synthesis and decay. Despite the critical importance of RNA decay rates, current approaches for measuring them are generally labor-intensive, limited in sensitivity, and/or disruptive to normal cellular processes. Here, we introduce a simple method for estimating relative RNA half-lives that is based on two standard and widely available high-throughput assays: Precision Run-On sequencing (PRO-seq) and RNA sequencing (RNA-seq). RESULTS: Our method treats PRO-seq as a measure of transcription rate and RNA-seq as a measure of RNA concentration, and estimates the rate of RNA decay required for a steady-state equilibrium. We show that this approach can be used to assay relative RNA half-lives genome-wide, with good accuracy and sensitivity for both coding and noncoding transcription units. Using a structural equation model (SEM), we test several features of transcription units, nearby DNA sequences, and nearby epigenomic marks for associations with RNA stability after controlling for their effects on transcription. We find that RNA splicing-related features are positively correlated with RNA stability, whereas features related to miRNA binding and DNA methylation are negatively correlated with RNA stability. Furthermore, we find that a measure based on U1 binding and polyadenylation sites distinguishes between unstable noncoding and stable coding transcripts but is not predictive of relative stability within the mRNA or lincRNA classes. We also identify several histone modifications that are associated with RNA stability. CONCLUSION: We introduce an approach for estimating the relative half-lives of individual RNAs. Together, our estimation method and systematic analysis shed light on the pervasive impacts of RNA stability on cellular RNA concentrations

    A community-maintained standard library of population genetic models

    Get PDF
    The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Statistical Models For The Function And Evolution Of Cis-Regulatory Elements In Mammals

    Get PDF
    Precise gene regulation is essential for a wide variety of transient, developmental, and homeostatic processes. The majority of gene regulation is mediated by cis-regulatory elements, both distal (enhancers), and proximal (promoters \& enhancers). Developments in biochemical assays, gene editing techniques, and sequencing technology have enabled genome-wide profiling of regulatory elements over a wide variety of \textit{in vivo} conditions. In this tripartite work, I present separate statistical frameworks for analyzing how these repertoires of regulatory elements work at both physiological, and evolutionary timescales. The first part describes the use of PRO-seq to characterize rapid changes in the transcriptional landscape of human cells to celastrol, a compound that has potent anti-inflammatory, tumor-inhibitory, and obesity-controlling effects. By exploiting the ability of PRO-seq to detect nascent RNAs, I characterize the transcriptional response at both genes and enhancers, and leverage statistical models to detect transcription factors that orchestrate it. I implicate several transcription factors in early transcriptional changes, including members of the E2F and RFX families. PRO-seq also allows us to detect an increase in transcription start site proximal pausing, suggesting that pause release may be a mechanism for inhibiting gene expression during the celastrol response. This work demonstrates that a thorough analysis of PRO-seq time-course data can provide novel insight into multiple aspects of a complex transcriptional response.The second part develops a statistical model for determining whether constituent enhancers of a ``super-enhancer'' exhibit synergy and thus address the question ``Is a super-enhancer greater than the sum of its parts?'' In this work I reconcile two works with seemingly opposing theses by finding that we cannot confidently reject synergy-free models for super-enhancers. Furthermore, I demonstrate that thoughtful consideration of null models for synergy in gene regulation is critical for furthering our understanding of ensembles of regulatory elements.In the final section, I develop evolutionary models for cis-regulatory function as quantified by genome-wide biochemical assays. I apply a noise-aware phylogenetic model to analyze the evolution of H3K27Ac and H3K4me3 histone marks as proxies of enhancer and promoter function. I estimate relative turnover rates for a variety of functional element categories and show that gene expression and sequence constraint correlate with turnover rate. I also propose that dosage sensitivity of target genes can explain the discrepancy between sequence and histone mark turnover rates of associated CREs.This work illustrates the important role statistical models play in understanding gene regulation at all levels and suggests a potential path towards unified models of gene regulation and evolution

    Extreme purifying selection against point mutations in the human genome

    Get PDF
    Genome sequencing of tens of thousands of humans has enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or “ultraselection” (λs), as the fractional depletion of rare single-nucleotide variants in target genomic sites relative to matched sites that are putatively free from selection, after controlling for local variation and neighbor-dependence in mutation rate. We show using simulations that λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. By contrast, we find weak evidence in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3–0.5% of the human genome is ultraselected, implying ~0.3–0.4 lethal or nearly lethal de novo mutations per potential human zygote. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics

    Is a super-enhancer greater than the sum of its parts?

    No full text
    The recent back-to-back articles by Hay et al.1 and Shin et al.2 both addressed the important question of how the constituent enhancers of a so-called 'super-enhancer' combine to activate the expression of a target gene. Super-enhancers are collections of closely spaced genomic regions that exhibit hallmarks of enhancers, such as binding by the Mediator complex and acetylation of histone H3 at lysine 27 (H3K27ac)3, 4, 5. As the authors of these articles noted1, 2, there is continuing controversy over whether super-enhancers genuinely represent a new paradigm in transcriptional regulation or whether they may essentially just be clusters of conventional enhancers that together produce a strong transcriptional response6
    corecore