1,455 research outputs found

    Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq

    Get PDF
    Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes

    Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data

    Get PDF
    BACKGROUND: Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context. METHODS: We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters. RESULTS: We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters. CONCLUSION: Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters

    Selective Activation of Alternative MYC Core Promoters by Wnt-Responsive Enhancers.

    Get PDF
    In Metazoans, transcription of most genes is driven by the use of multiple alternative promoters. Although the precise regulation of alternative promoters is important for proper gene expression, the mechanisms that mediates their differential utilization remains unclear. Here, we investigate how the two alternative promoters (P1, P2) that drive MYC expression are regulated. We find that P1 and P2 can be differentially regulated across cell-types and that their selective usage is largely mediated by distal regulatory sequences. Moreover, we show that in colon carcinoma cells, Wnt-responsive enhancers preferentially upregulate transcription from the P1 promoter using reporter assays and in the context of the endogenous Wnt induction. In addition, multiple enhancer deletions using CRISPR/Cas9 corroborate the regulatory specificity of P1. Finally, we show that preferential activation between Wnt-responsive enhancers and the P1 promoter is influenced by the distinct core promoter elements that are present in the MYC promoters. Taken together, our results provide new insight into how enhancers can specifically target alternative promoters and suggest that formation of these selective interactions could allow more precise combinatorial regulation of transcription initiation

    Genome-wide definition of promoter and enhancer usage during neural induction of human embryonic stem cells

    Get PDF
    Genome-wide mapping of transcriptional regulatory elements is an essential tool for understanding the molecular events orchestrating self-renewal, commitment and differentiation of stem cells. We combined high-throughput identification of transcription start sites with genome-wide profiling of histones modifications to map active promoters and enhancers in embryonic stem cells (ESCs) induced to neuroepithelial-like stem cells (NESCs). Our analysis showed that most promoters are active in both cell types while approximately half of the enhancers are cell-specific and account for most of the epigenetic changes occurring during neural induction, and most likely for the modulation of the promoters to generate cell-specific gene expression programs. Interestingly, the majority of the promoters activated or up-regulated during neural induction have a "bivalent" histone modification signature in ESCs, suggesting that developmentally-regulated promoters are already poised for transcription in ESCs, which are apparently pre-committed to neuroectodermal differentiation. Overall, our study provides a collection of differentially used enhancers, promoters, transcription starts sites, protein-coding and non-coding RNAs in human ESCs and ESC-derived NESCs, and a broad, genome-wide description of promoter and enhancer usage and of gene expression programs characterizing the transition from a pluripotent to a neural-restricted cell fate

    MPromDb update 2010: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-seq experimental data

    Get PDF
    MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 different data sets that include six human cell-types and 10 distinct mouse cell/tissues. The updated MPromDb version consists of computationally predicted (novel) and known active RNAP-II promoters (42 893 human and 48 366 mouse promoters) from various data sets freely available at NCBI GEO database. We found that 36% and 40% of protein-coding genes have alternative promoters in human and mouse genomes and ∼40% of promoters are tissue/cell specific. The identified RNAP-II promoters were annotated using various known and novel gene models. Additionally, for novel promoters we looked into other evidences—GenBank mRNAs, spliced ESTs, CAGE promoter tags and mRNA-seq reads. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters. We have also integrated GBrowse genome browser with MPromDb for visualization of ChIP-seq profiles and to display the annotations. The current release of MPromDb can be accessed at http://bioinformatics.wistar.upenn.edu/MPromDb/

    Systematic clustering of transcription start site landscapes

    No full text
    Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.This work was supported by a grant from the Novo Nordisk Foundation, http://www.novonordiskfonden.dk/. The European Research Council (http:// erc.europa.eu/) has provided financial support to Dr. Sandelin under the EU 7th Framework Programme (FP7/2007-2013)/ERC grant agreement 204135

    Systematic clustering of transcription start site landscapes

    Get PDF
    Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed

    tRNA biology in the omics era: Stress signalling dynamics and cancer progression.

    Get PDF
    Recent years have seen a burst in the number of studies investigating tRNA biology. With the transition from a gene-centred to a genome-centred perspective, tRNAs and other RNA polymerase III transcripts surfaced as active regulators of normal cell physiology and disease. Novel strategies removing some of the hurdles that prevent quantitative tRNA profiling revealed that the differential exploitation of the tRNA pool critically affects the ability of the cell to balance protein homeostasis during normal and stress conditions. Furthermore, growing evidence indicates that the adaptation of tRNA synthesis to cellular dynamics can influence translation and mRNA stability to drive carcinogenesis and other pathological disorders. This review explores the contribution given by genomics, transcriptomics and epitranscriptomics to the discovery of emerging tRNA functions, and gives insights into some of the technical challenges that still limit our understanding of the RNA polymerase III transcriptional machinery

    Identifying genome-wide transcription units from histone modifications using EPIGENE

    Get PDF
    With the successful completion of the human genome project and the rapid development of sequencing technologies, transcriptome annotation across multiple human cell types and tissues is now available. Accurate transcriptome annotation is critical for understanding the functional as well as the regulatory roles of genomic regions. Current methods for identifying genome-wide active transcription units (TUs) use RNA sequencing (RNA-seq). However, this approach requires large quantities of mRNAs making the identification of highly unstable regulatory RNAs (like microRNA precursors) difficult. As a result of this complexity in identifying inherently unstable TUs, the transcriptome landscape across all cells and tissues remains incomplete. This problem can be alleviated by chromatin-based approaches due to a well-established correlation between transcription and histone modification. Here, I present EPIGENE, a novel chromatin segmentation method for identifying genome-wide active TUs using transcription-associated histone modifications. Unlike existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate Hidden Markov Model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables to identify the chromatin state sequence underlying an active TU. Using EPIGENE, I successfully predicted genome-wide TUs across multiple human cell lines. EPIGENE predicted TUs were enriched for RNA Polymerase II (Pol II) at the transcription start site (TSS) and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE predicted TUs more precisely compared to existing chromatin segmentation and RNA-seq based approaches across multiple human cell lines. Using EPIGENE, I also identified 232 novel TUs in K562 and 43 novel cell-specific TUs in K562, HepG2, and IMR90, all of which were supported by Pol II ChIP-seq and nascent RNA-seq evidence
    corecore