78 research outputs found

    The Genetic and Mechanistic Basis for Variation in Gene Regulation

    Get PDF
    It is now well established that noncoding regulatory variants play a central role in the genetics of common diseases and in evolution. However, until recently, we have known little about the mechanisms by which most regulatory variants act. For instance, what types of functional elements in DNA, RNA, or proteins are most often affected by regulatory variants? Which stages of gene regulation are typically altered? How can we predict which variants are most likely to impact regulation in a given cell type? Recent studies, in many cases using quantitative trait loci (QTL)-mapping approaches in cell lines or tissue samples, have provided us with considerable insight into the properties of genetic loci that have regulatory roles. Such studies have uncovered novel biochemical regulatory interactions and led to the identification of previously unrecognized regulatory mechanisms. We have learned that genetic variation is often directly associated with variation in regulatory activities (namely, we can map regulatory QTLs, not just expression QTLs [eQTLs]), and we have taken the first steps towards understanding the causal order of regulatory events (for example, the role of pioneer transcription factors). Yet, in most cases, we still do not know how to interpret overlapping combinations of regulatory interactions, and we are still far from being able to predict how variation in regulatory mechanisms is propagated through a chain of interactions to eventually result in changes in gene expression profiles.National Institutes of Health (U.S.) (grant NIH HG006123)National Institutes of Health (U.S.) (NIH GM007197)National Institutes of Health (U.S.) (grant NIH MH084703)Howard Hughes Medical InstituteJane Coffin Childs Memorial Fund for Medical Research (postdoctoral fellowship

    RNA-seq: impact of RNA degradation on transcript quantification

    Get PDF
    Background The use of low quality RNA samples in whole-genome gene expression profiling remains controversial. It is unclear if transcript degradation in low quality RNA samples occurs uniformly, in which case the effects of degradation can be corrected via data normalization, or whether different transcripts are degraded at different rates, potentially biasing measurements of expression levels. This concern has rendered the use of low quality RNA samples in whole-genome expression profiling problematic. Yet, low quality samples (for example, samples collected in the course of fieldwork) are at times the sole means of addressing specific questions. Results We sought to quantify the impact of variation in RNA quality on estimates of gene expression levels based on RNA-seq data. To do so, we collected expression data from tissue samples that were allowed to decay for varying amounts of time prior to RNA extraction. The RNA samples we collected spanned the entire range of RNA Integrity Number (RIN) values (a metric commonly used to assess RNA quality). We observed widespread effects of RNA quality on measurements of gene expression levels, as well as a slight but significant loss of library complexity in more degraded samples. Conclusions While standard normalizations failed to account for the effects of degradation, we found that by explicitly controlling for the effects of RIN using a linear model framework we can correct for the majority of these effects. We conclude that in instances in which RIN and the effect of interest are not associated, this approach can help recover biologically meaningful signals in data from degraded RNA samples.American Heart Association (Predoctoral Fellowship

    Numerous recursive sites contribute to accuracy of splicing of long introns in flies [preprint]

    Get PDF
    Recursive splicing, a process by which a single intron is removed from pre-mRNA transcripts in multiple distinct segments, has been observed in a small subset of Drosophila melanogaster introns. However, detection of recursive splicing requires observation of splicing intermediates which are inherently unstable, making it difficult to study. Here we developed new computational approaches to identify recursively spliced introns and applied them, in combination with existing methods, to nascent RNA sequencing data from Drosophila S2 cells. These approaches identified hundreds of novel sites of recursive splicing, expanding the catalog of recursively spliced fly introns by 4-fold. Recursive sites occur in most very long (\u3e 40 kb) fly introns, including many genes involved in morphogenesis and development, and tend to occur near the midpoints of introns. Suggesting a possible function for recursive splicing, we observe that fly introns with recursive sites are spliced more accurately than comparably sized non-recursive introns

    Bayesian nonparametric discovery of isoforms and individual specific quantification

    Get PDF
    Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios

    Controls of nucleosome positioning in the human genome

    Get PDF
    Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase-seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase-seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned--in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins

    Widespread occurrence of hybrid internal-terminal exons in human transcriptomes

    Get PDF
    Now published in Science Advances doi: 10.1126/sciadv.abk1752.Alternative RNA processing is a major mechanism for diversifying the human transcriptome. Messenger RNA isoform differences are predominantly driven by alternative first exons, cassette internal exons and alternative last exons. Despite the importance of classifying exons to understand isoform structure, there is a lack of tools to look at isoform-specific exon usage using RNA-sequencing data. We recently observed that alternative transcription start sites often arise near annotated internal exons, creating “hybrid” exons that can be used as both first or internal exons. To investigate the creation of hybrid exons, we built the HIT (Hybrid-Internal-Terminal) exon pipeline that systematically classifies exons depending on their isoform-specific usage. Using a combination of junction reads coverage and probabilistic modeling, the HIT index identified thousands of hybrid first-internal and internal-last exons that were previously misclassified. Hybrid exons are enriched in long genes with at least ten internal exons, have longer flanking introns and strong splice sites. The usage of hybrid exons varies considerably across human tissues, but they are predominantly used in brain, testis and colon cells. Notably, genes involved in RNA splicing have the highest fraction of intra-tissue hybrid exons. Further, we found more than 100,000 inter-tissue hybrid exons that changed from internal to terminal exons across tissues. By developing the first method that can classify exons according to their isoform contexts, our findings demonstrate the existence of hybrid exons, expand the repertoire of tissue-specific terminal exons and uncover unexpected complexities of the human transcriptome.Accepted manuscrip
    • …
    corecore