738 research outputs found

    High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression

    Get PDF
    Many Eukaryotic genes possess multiple alternative promoters with distinct expression specificities. Therefore, comprehensively annotating promoters and deciphering their individual regulatory dynamics is critical for gene expression profiling applications, and for our understanding of regulatory complexity. We introduce RAMPAGE, a novel promoter activity profiling approach that combines extremely specific 5'-complete cDNA sequencing with an integrated data analysis workflow to address the limitations of current techniques. RAMPAGE features a streamlined protocol for fast and easy generation of highly multiplexed sequencing libraries, offers very high transcription start site specificity, generates accurate and reproducible promoter expression measurements, and yields extensive transcript connectivity information through paired-end cDNA sequencing. We used RAMPAGE in a genome-wide study of promoter activity throughout 36 stages of the life cycle of Drosophila melanogaster, and describe here a comprehensive dataset that represents the first available developmental timecourse of promoter usage. We found that over 40% of developmentally expressed genes have at least 2 promoters, and that alternative promoters generally implement distinct regulatory programs. Transposable elements, long proposed to play a central role in the evolution of their host genomes through their ability to regulate gene expression, contribute at least 1,300 promoters shaping the developmental transcriptome of D. melanogaster. Hundreds of these promoters drive the expression of annotated genes, and transposons often impart their own expression specificity upon the genes they regulate. These observations provide support for the theory that transposons may drive regulatory innovation through the distribution of stereotyped cis-regulatory modules throughout their host genomes

    High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur

    Get PDF
    Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.

    Mapping the strand-specific transcriptome of fission yeast

    Get PDF
    Pervasive genome-wide transcription is widespread in eukaryotic cells, but key features of the transcriptome have yet to be fully characterized. a new study using antibody-based detection of RNA-DNA duplexes on tiling arrays now reveals a complex, strand-specific transcriptional world in fission yeast

    The effect of genetic variation on promoter usage and enhancer activity.

    Get PDF
    The identification of genetic variants affecting gene expression, namely expression quantitative trait loci (eQTLs), has contributed to the understanding of mechanisms underlying human traits and diseases. The majority of these variants map in non-coding regulatory regions of the genome and their identification remains challenging. Here, we use natural genetic variation and CAGE transcriptomes from 154 EBV-transformed lymphoblastoid cell lines, derived from unrelated individuals, to map 5376 and 110 regulatory variants associated with promoter usage (puQTLs) and enhancer activity (eaQTLs), respectively. We characterize five categories of genes associated with puQTLs, distinguishing single from multi-promoter genes. Among multi-promoter genes, we find puQTL effects either specific to a single promoter or to multiple promoters with variable effect orientations. Regulatory variants associated with opposite effects on different mRNA isoforms suggest compensatory mechanisms occurring between alternative promoters. Our analyses identify differential promoter usage and modulation of enhancer activity as molecular mechanisms underlying eQTLs related to regulatory elements

    Multiplicity of 5' Cap Structures Present on Short RNAs

    Get PDF
    Most RNA molecules are co- or post-transcriptionally modified to alter their chemical and functional properties to assist in their ultimate biological function. Among these modifications, the addition of 5' cap structure has been found to regulate turnover and localization. Here we report a study of the cap structure of human short (<200 nt) RNAs (sRNAs), using sequencing of cDNA libraries prepared by enzymatic pretreatment of the sRNAs with cap sensitive-specificity, thin layer chromatographic (TLC) analyses of isolated cap structures and mass spectrometric analyses for validation of TLC analyses. Processed versions of snoRNAs and tRNAs sequences of less than 50 nt were observed in capped sRNA libraries, indicating additional processing and recapping of these annotated sRNAs biotypes. We report for the first time 2,7 dimethylguanosine in human sRNAs cap structures and surprisingly we find multiple type 0 cap structures (mGpppC, 7mGpppG, GpppG, GpppA, and 7mGpppA) in RNA length fractions shorter than 50 nt. Finally, we find the presence of additional uncharacterized cap structures that wait determination by the creation of needed reference compounds to be used in TLC analyses. These studies suggest the existence of novel biochemical pathways leading to the processing of primary and sRNAs and the modifications of their RNA 5' ends with a spectrum of chemical modifications

    Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching

    Get PDF
    Template switching (TS) has been an inherent mechanism of reverse transcriptase, which has been exploited in several transcriptome analysis methods, such as CAGE, RNA-Seq and short RNA sequencing. TS is an attractive option, given the simplicity of the protocol, which does not require an adaptor mediated step and thus minimizes sample loss. As such, it has been used in several studies that deal with limited amounts of RNA, such as in single cell studies. Additionally, TS has also been used to introduce DNA barcodes or indexes into different samples, cells or molecules. This labeling allows one to pool several samples into one sequencing flow cell, increasing the data throughput of sequencing and takes advantage of the increasing throughput of current sequences. Here, we report TS artifacts that form owing to a process called strand invasion. Due to the way in which barcodes/indexes are introduced by TS, strand invasion becomes more problematic by introducing unsystematic biases. We describe a strategy that eliminates these artifacts in silico and propose an experimental solution that suppresses biases from TS

    Opening the black box of outer space: the case of Jason-3

    Get PDF
    If you look at a rendering of planet Earth from a bird's eye view, you will see satellites orbiting the planet like electrons, each one a testament to humanity's expansion beyond Earth's atmosphere. It begs the question: what is this new humanized landscape? The dominant voice that has attempted to answer this question is the realist one, which has led the charge of academic inquiry into outer space since the fateful launch of the Sputnik in 1957. Though enlightening in some respects, the realist perspective oftentimes obscures the heterogeneous complexity of the actors, actions, limits and possibilities that have constructed this very humanized outer space. This paper looks at the humanization of outer space through the lens of JASON-3, an internationally collaborative satellite designed primarily to measure the topography of the Earth's oceans. A vast number of actors collaborated to enact the network that created JASON-3, including bureaucratic agencies, academics, private contractors, political bodies, other satellites, the sun and even gravity. This paper will focus on these actors and the work that they did to form the network, showing a glimpse of the entangled connections that eventually produced JASON-3. Through telling this story, I argue: (1) outer space is more complex than state-level relations and (2) critical geography -- with its insight into relational spaces and deconstructing power structures -- has a unique place to fill in outer space literature

    Dual-initiation promoters with intertwined canonical and TCT/TOP transcription start sites diversify transcript processing

    Get PDF
    Variations in transcription start site (TSS) selection reflect diversity of preinitiation complexes and can impact on post-transcriptional RNA fates. Most metazoan polymerase II-transcribed genes carry canonical initiation with pyrimidine/purine (YR) dinucleotide, while translation machinery-associated genes carry polypyrimidine initiator (5'-TOP or TCT). By addressing the developmental regulation of TSS selection in zebrafish we uncovered a class of dual-initiation promoters in thousands of genes, including snoRNA host genes. 5'-TOP/TCT initiation is intertwined with canonical initiation and used divergently in hundreds of dual-initiation promoters during maternal to zygotic transition. Dual-initiation in snoRNA host genes selectively generates host and snoRNA with often different spatio-temporal expression. Dual-initiation promoters are pervasive in human and fruit fly, reflecting evolutionary conservation. We propose that dual-initiation on shared promoters represents a composite promoter architecture, which can function both coordinately and divergently to diversify RNAs

    Automated Workflow for Preparation of cDNA for Cap Analysis of Gene Expression on a Single Molecule Sequencer

    Get PDF
    Background: Cap analysis of gene expression (CAGE) is a 59 sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol. Methodology: In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 ‘HeliScope ready ’ CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility. Conclusions: We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and fulllengt

    Large-scale clustering of CAGE tag expression data

    Get PDF
    Background: Recent analyses have suggested that many genes possess multiple transcription start sites (TSSs) that are differentially utilized in different tissues and cell lines. We have identified a huge number of TSSs mapped onto the mouse genome using the cap analysis of gene expression (CAGE) method. The standard hierarchical clustering algorithm, which gives us easily understandable graphical tree images, has difficulties in processing such huge amounts of TSS data and a better method to calculate and display the results is needed. Results: We use a combination of hierarchical and non-hierarchical clustering to cluster expression profiles of TSSs based on a large amount of CAGE data to profit from the best of both methods. We processed the genome-wide expression data, including 159,075 TSSs derived from 127 RNA samples of various organs of mouse, and succeeded in categorizing them into 70-100 clusters. The clusters exhibited intriguing biological features: a cluster supergroup with a ubiquitous expression profile, tissue-specific patterns, a distinct distribution of non-coding RNA and functional TSS groups. Conclusion: Our approach succeeded in greatly reducing the calculation cost, and is an appropriate solution for analyzing large-scale TSS usage data
    corecore