23 research outputs found

    Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis

    Get PDF
    Spatiotemporal control of gene expression is central to animal development. Core promoters represent a previously unanticipated regulatory level by interacting with cis-regulatory elements and transcription initiation in different physiological and developmental contexts. Here, we provide a first and comprehensive description of the core promoter repertoire and its dynamic use during the development of a vertebrate embryo. By using cap analysis of gene expression (CAGE), we mapped transcription initiation events at single nucleotide resolution across 12 stages of zebrafish development. These CAGE-based transcriptome maps reveal genome-wide rules of core promoter usage, structure, and dynamics, key to understanding the control of gene regulation during vertebrate ontogeny. They revealed the existence of multiple classes of pervasive intra- and intergenic post-transcriptionally processed RNA products and their developmental dynamics. Among these RNAs, we report splice donor site-associated intronic RNA (sRNA) to be specific to genes of the splicing machinery. For the identification of conserved features, we compared the zebrafish data sets to the first CAGE promoter map of Tetraodon and the existing human CAGE data. We show that a number of features, such as promoter type, newly discovered promoter properties such as a specialized purine-rich initiator motif, as well as sRNAs and the genes in which they are detected, are conserved in mammalian and Tetraodon CAGE-defined promoter maps. The zebrafish developmental promoterome represents a powerful resource for studying developmental gene regulation and revealing promoter features shared across vertebrates.publishedVersio

    Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis

    Get PDF
    Spatiotemporal control of gene expression is central to animal development. Core promoters represent a previously unanticipated regulatory level by interacting with cis-regulatory elements and transcription initiation in different physiological and developmental contexts. Here, we provide a first and comprehensive description of the core promoter repertoire and its dynamic use during the development of a vertebrate embryo. By using cap analysis of gene expression (CAGE), we mapped transcription initiation events at single nucleotide resolution across 12 stages of zebrafish development. These CAGE-based transcriptome maps reveal genome-wide rules of core promoter usage, structure, and dynamics, key to understanding the control of gene regulation during vertebrate ontogeny. They revealed the existence of multiple classes of pervasive intra- and intergenic post-transcriptionally processed RNA products and their developmental dynamics. Among these RNAs, we report splice donor site-associated intronicRNA(sRNA) to be specific to genes of the splicing machinery. For the identification of conserved features, we compared the zebrafish data sets to the first CAGE promoter map of Tetraodon and the existing human CAGE data. We show that a number of features, such as promoter type, newly discovered promoter properties such as a specialized purine-rich initiator motif, as well as sRNAs and the genes in which they are detected, are conserved in mammalian and Tetraodon CAGE-defined promoter maps. The zebrafish developmental promoterome represents a powerful resource for studying developmental gene regulation and revealing promoter features shared across vertebrates

    The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome.

    Get PDF
    X chromosomes are unusual in many regards, not least of which is their nonrandom gene content. The causes of this bias are commonly discussed in the context of sexual antagonism and the avoidance of activity in the male germline. Here, we examine the notion that, at least in some taxa, functionally biased gene content may more profoundly be shaped by limits imposed on gene expression owing to haploid expression of the X chromosome. Notably, if the X, as in primates, is transcribed at rates comparable to the ancestral rate (per promoter) prior to the X chromosome formation, then the X is not a tolerable environment for genes with very high maximal net levels of expression, owing to transcriptional traffic jams. We test this hypothesis using The Encyclopedia of DNA Elements (ENCODE) and data from the Functional Annotation of the Mammalian Genome (FANTOM5) project. As predicted, the maximal expression of human X-linked genes is much lower than that of genes on autosomes: on average, maximal expression is three times lower on the X chromosome than on autosomes. Similarly, autosome-to-X retroposition events are associated with lower maximal expression of retrogenes on the X than seen for X-to-autosome retrogenes on autosomes. Also as expected, X-linked genes have a lesser degree of increase in gene expression than autosomal ones (compared to the human/Chimpanzee common ancestor) if highly expressed, but not if lowly expressed. The traffic jam model also explains the known lower breadth of expression for genes on the X (and the Z of birds), as genes with broad expression are, on average, those with high maximal expression. As then further predicted, highly expressed tissue-specific genes are also rare on the X and broadly expressed genes on the X tend to be lowly expressed, both indicating that the trend is shaped by the maximal expression level not the breadth of expression per se. Importantly, a limit to the maximal expression level explains biased tissue of expression profiles of X-linked genes. Tissues whose tissue-specific genes are very highly expressed (e.g., secretory tissues, tissues abundant in structural proteins) are also tissues in which gene expression is relatively rare on the X chromosome. These trends cannot be fully accounted for in terms of alternative models of biased expression. In conclusion, the notion that it is hard for genes on the Therian X to be highly expressed, owing to transcriptional traffic jams, provides a simple yet robustly supported rationale of many peculiar features of X's gene content, gene expression, and evolution

    Overlapping transcription initiation codes and promoter interpretation in vertebrate development and differentiation

    No full text
    A core promoter is a minimal region sufficient to direct the accurate initiation of transcription. Various core promoter elements have been discovered that recruit and position transcriptional machinery, which then initiates transcription at individual transcription start sites (TSS); however, no universal promoter code has been established. The methods and results presented in this thesis focus on innovative analysis of precise transcription initiation data to reveal sequence and chromatin features underlying core promoters and their dynamic usage in development and differentiation. Cap analysis of gene expression (CAGE) provides a single base-pair resolution map of TSSs and their relative usage, and it is a powerful tool for studying promoter structure and function. It has led to the discovery of major promoter classes that differ in transcription initiation patterns: “sharp” promoters in which the majority of transcription starts at one clearly dominant TSS, and “broad” promoters with multiple equally used TSS positions distributed along a wider region. By applying CAGE to a developmental time-course of zebrafish (Danio rerio) we created a first comprehensive map of transcription initiation during vertebrate embryogenesis and revealed widespread dynamics in promoter usage at all levels, from alternative promoters to individual TSSs. We found that thousands of promoters are utilized differently by the oocyte and the embryo, uncovering two independent codes that drive dynamic changes in TSS usage and promoter shape. Maternal TSS selection is guided by an A/T-rich W-box motif positioned at a fixed spacing from the TSS producing a sharp promoter architecture, whereas zygotic selection is restricted by the position of the first downstream nucleosome and produces broad promoter architecture with the dominant TSS aligned to inter- and intranucleosomal sequence positioning signals. The two grammars co-exist in close proximity or in physical overlap at promoters genome-wide. We further showed that a tight association between dominant TSS in broad promoters and nucleosome positioning exists in human and mouse transcription. Alignment of the intranucleosomal dinucleotide frequency patterns downstream of the TSS revealed that a well-positioned +1 nucleosome is a key determinant of TSS preference in broad promoters. Its presence in both zebrafish and mammals suggests the evolutionary conservation of the underlying nucleosome-associated TSS selection mechanism. Precise TSS localisation is crucial for promoter-centred analyses of any genomewide data. To facilitate the reuse of high-resolution and context-specific TSSs derived from a growing resource of CAGE data, we developed CAGEr, an R/Bioconductor software package for promoterome mining. CAGEr provides easy access to the majority of published CAGE datasets and presents a comprehensive workflow for processing, visualisation and analysis of precise promoter data, and allows its integration with other genome data types. Taken together, the work presented in this thesis reveals unexpected dynamics in core promoter usage at TSS level and demonstrates that promoter type is not an inherent property of the genomic locus, but is rather dependent on the regulatory context. The existence of overlapping transcription initiation codes has important implications for future analyses of promoter content and function

    FANTOM3and4CAGE: an R data package with CAGE data from FANTOM3 and FANTOM4 projects

    No full text
    2.2 CAGE datasets for various tissues......................

    CAGEr: Precise TSS data retrieval and high-resolution promoterome mining for integrative analyses

    Get PDF
    Cap analysis of gene expression (CAGE) is a high-throughput method for transcriptome analysis that provides a single base-pair resolution map of transcription start sites (TSS) and their relative usage. Despite their high resolution and functional significance, published CAGE data are still underused in promoter analysis due to the absence of tools that enable its efficient manipulation and integration with other genome data types. Here we present CAGEr, an R implementation of novel methods for the analysis of differential TSS usage and promoter dynamics, integrated with CAGE data processing and promoterome mining into a first comprehensive CAGE toolbox on a common analysis platform. Crucially, we provide collections of TSSs derived from most published CAGE datasets, as well as direct access to FANTOM5 resource of TSSs for numerous human and mouse cell/tissue types from within R, greatly increasing the accessibility of precise context-specific TSS data for integrative analyses. The CAGEr package is freely available from Bioconductor at http://www.bioconductor.org/packages/release/bioc/html/CAGEr.html

    Genome-wide DNA methylation profiling of non-small cell lung carcinomas

    Get PDF
    Background: Non-small cell lung carcinoma (NSCLC) is a complex malignancy that owing to its heterogeneity and poor prognosis poses many challenges to diagnosis, prognosis and patient treatment. DNA methylation is an important mechanism of epigenetic regulation involved in normal development and cancer. It is a very stable and specific modification and therefore in principle a very suitable marker for epigenetic phenotyping of tumors. Here we present a genome-wide DNA methylation analysis of NSCLC samples and paired lung tissues, where we combine MethylCap and next generation sequencing (MethylCap-seq) to provide comprehensive DNA methylation maps of the tumor and paired lung samples. The MethylCap-seq data were validated by bisulfite sequencing and methyl-specific polymerase chain reaction of selected regions. Results: Analysis of the MethylCap-seq data revealed a strong positive correlation between replicate experiments and between paired tumor/lung samples. We identified 57 differentially methylated regions (DMRs) present in all NSCLC tumors analyzed by MethylCap-seq. While hypomethylated DMRs did not correlate to any particular functional category of genes, the hypermethylated DMRs were strongly associated with genes encoding transcriptional regulators. Furthermore, subtelomeric regions and satellite repeats were hypomethylated in the NSCLC samples. We also identified DMRs that were specific to two of the major subtypes of NSCLC, adenocarcinomas and squamous cell carcinomas. Conclusions: Collectively, we provide a resource containing genome-wide DNA methylation maps of NSCLC and their paired lung tissues, and comprehensive lists of known and novel DMRs and associated genes in NSCLC