89 research outputs found

    High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression

    Get PDF
    Many Eukaryotic genes possess multiple alternative promoters with distinct expression specificities. Therefore, comprehensively annotating promoters and deciphering their individual regulatory dynamics is critical for gene expression profiling applications, and for our understanding of regulatory complexity. We introduce RAMPAGE, a novel promoter activity profiling approach that combines extremely specific 5'-complete cDNA sequencing with an integrated data analysis workflow to address the limitations of current techniques. RAMPAGE features a streamlined protocol for fast and easy generation of highly multiplexed sequencing libraries, offers very high transcription start site specificity, generates accurate and reproducible promoter expression measurements, and yields extensive transcript connectivity information through paired-end cDNA sequencing. We used RAMPAGE in a genome-wide study of promoter activity throughout 36 stages of the life cycle of Drosophila melanogaster, and describe here a comprehensive dataset that represents the first available developmental timecourse of promoter usage. We found that over 40% of developmentally expressed genes have at least 2 promoters, and that alternative promoters generally implement distinct regulatory programs. Transposable elements, long proposed to play a central role in the evolution of their host genomes through their ability to regulate gene expression, contribute at least 1,300 promoters shaping the developmental transcriptome of D. melanogaster. Hundreds of these promoters drive the expression of annotated genes, and transposons often impart their own expression specificity upon the genes they regulate. These observations provide support for the theory that transposons may drive regulatory innovation through the distribution of stereotyped cis-regulatory modules throughout their host genomes

    SAMStat: monitoring biases in next generation sequencing data

    Get PDF
    Motivation: The sequence alignment/map format (SAM) is a commonly used format to store the alignments between millions of short reads and a reference genome. Often certain positions within the reads are inherently more likely to contain errors due to the protocols used to prepare the samples. Such biases can have adverse effects on both mapping rate and accuracy. To understand the relationship between potential protocol biases and poor mapping we wrote SAMstat, a simple C program plotting nucleotide overrepresentation and other statistics in mapped and unmapped reads in a concise html page. Collecting such statistics also makes it easy to highlight problems in the data processing and enables non-experts to track data quality over time

    H3S28P Antibody Staining of Okinawan Oikopleura dioica Suggests the Presence of Three Chromosomes [version 2; peer review: 2 approved]

    Get PDF
    Oikopleura dioica is a ubiquitous marine zooplankton of biological interest owing to features that include dioecious reproduction, a short life cycle, conserved chordate body plan, and a compact genome. It is an important tunicate model for evolutionary and developmental research, as well as investigations into marine ecosystems. The genome of north Atlantic O. dioica comprises three chromosomes. However, comparisons with the genomes of O. dioica sampled from mainland and southern Japan revealed extensive sequence differences. Moreover, historical studies have reported widely varying chromosome counts. We recently initiated a project to study the genomes of O. dioica individuals collected from the coastline of the Ryukyu (Okinawa) Islands in southern Japan. Given the potentially large extent of genomic diversity, we employed karyological techniques to count individual animals’ chromosomes in situ using centromere-specific antibodies directed against H3S28P, a prophase-metaphase cell cycle-specific marker of histone H3. Epifluorescence and confocal images were obtained of embryos and oocytes stained with two commercial anti-H3S28P antibodies (Abcam ab10543 and Thermo Fisher 07-145). The data lead us to conclude that diploid cells from Okinawan O. dioica contain three pairs of chromosomes, in line with the north Atlantic populations. The finding facilitates the telomere-to-telomere assembly of Okinawan O. dioica genome sequences and gives insight into the genomic diversity of O. dioica from different geographical locations. The data deposited in the EBI BioImage Archive provide representative images of the antibodies’ staining properties for use in epifluorescent and confocal based fluorescent microscopy

    Multiplicity of 5' Cap Structures Present on Short RNAs

    Get PDF
    Most RNA molecules are co- or post-transcriptionally modified to alter their chemical and functional properties to assist in their ultimate biological function. Among these modifications, the addition of 5' cap structure has been found to regulate turnover and localization. Here we report a study of the cap structure of human short (<200 nt) RNAs (sRNAs), using sequencing of cDNA libraries prepared by enzymatic pretreatment of the sRNAs with cap sensitive-specificity, thin layer chromatographic (TLC) analyses of isolated cap structures and mass spectrometric analyses for validation of TLC analyses. Processed versions of snoRNAs and tRNAs sequences of less than 50 nt were observed in capped sRNA libraries, indicating additional processing and recapping of these annotated sRNAs biotypes. We report for the first time 2,7 dimethylguanosine in human sRNAs cap structures and surprisingly we find multiple type 0 cap structures (mGpppC, 7mGpppG, GpppG, GpppA, and 7mGpppA) in RNA length fractions shorter than 50 nt. Finally, we find the presence of additional uncharacterized cap structures that wait determination by the creation of needed reference compounds to be used in TLC analyses. These studies suggest the existence of novel biochemical pathways leading to the processing of primary and sRNAs and the modifications of their RNA 5' ends with a spectrum of chemical modifications

    Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site

    Get PDF
    We introduce a novel method to screen the promoters of a set of genes with shared biological function, against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. The gene sets were obtained from the functional Gene Ontology (GO) classification; for each set and motif we optimized the sequence similarity score threshold, independently for every location window (measured with respect to the TSS), taking into account the location dependent nucleotide heterogeneity along the promoters of the target genes. We performed a high throughput analysis, searching the promoters (from 200bp downstream to 1000bp upstream the TSS), of more than 8000 human and 23,000 mouse genes, for 134 functional Gene Ontology classes and for 412 known DNA motifs. When combined with binding site and location conservation between human and mouse, the method identifies with high probability functional binding sites that regulate groups of biologically related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were put to several experimental tests. By allowing a "flexible" threshold and combining our functional class and location specific search method with conservation between human and mouse, we are able to identify reliably functional TF binding sites. This is an essential step towards constructing regulatory networks and elucidating the design principles that govern transcriptional regulation of expression. The promoter region proximal to the TSS appears to be of central importance for regulation of transcription in human and mouse, just as it is in bacteria and yeast.Comment: 31 pages, including Supplementary Information and figure

    Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation

    Get PDF
    The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5′-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP–chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release

    Protocol Dependence of Sequencing-Based Gene Expression Measurements

    Get PDF
    RNA Seq provides unparalleled levels of information about the transcriptome including precise expression levels over a wide dynamic range. It is essential to understand how technical variation impacts the quality and interpretability of results, how potential errors could be introduced by the protocol, how the source of RNA affects transcript detection, and how all of these variations can impact the conclusions drawn. Multiple human RNA samples were used to assess RNA fragmentation, RNA fractionation, cDNA synthesis, and single versus multiple tag counting. Though protocols employing polyA RNA selection generate the highest number of non-ribosomal reads and the most precise measurements for coding transcripts, such protocols were found to detect only a fraction of the non-ribosomal RNA in human cells. PolyA RNA excludes thousands of annotated and even more unannotated transcripts, resulting in an incomplete view of the transcriptome. Ribosomal-depleted RNA provides a more cost-effective method for generating complete transcriptome coverage. Expression measurements using single tag counting provided advantages for assessing gene expression and for detecting short RNAs relative to multi-read protocols. Detection of short RNAs was also hampered by RNA fragmentation. Thus, this work will help researchers choose from among a range of options when analyzing gene expression, each with its own advantages and disadvantages

    Highly Parallel Genome-Wide Expression Analysis of Single Mammalian Cells

    Get PDF
    We have developed a high-throughput amplification method for generating robust gene expression profiles using single cell or low RNA inputs.The method uses tagged priming and template-switching, resulting in the incorporation of universal PCR priming sites at both ends of the synthesized cDNA for global PCR amplification. Coupled with a whole-genome gene expression microarray platform, we routinely obtain expression correlation values of R(2)~0.76-0.80 between individual cells and R(2)~0.69 between 50 pg total RNA replicates. Expression profiles generated from single cells or 50 pg total RNA correlate well with that generated with higher input (1 ng total RNA) (R(2)~0.80). Also, the assay is sufficiently sensitive to detect, in a single cell, approximately 63% of the number of genes detected with 1 ng input, with approximately 97% of the genes detected in the single-cell input also detected in the higher input.In summary, our method facilitates whole-genome gene expression profiling in contexts where starting material is extremely limiting, particularly in areas such as the study of progenitor cells in early development and tumor stem cell biology

    Digital Gene Expression Profiling by 5′-End Sequencing of cDNAs during Reprogramming in the Moss Physcomitrella patens

    Get PDF
    Stem cells self-renew and repeatedly produce differentiated cells during development and growth. The differentiated cells can be converted into stem cells in some metazoans and land plants with appropriate treatments. After leaves of the moss Physcomitrella patens are excised, leaf cells reenter the cell cycle and commence tip growth, which is characteristic of stem cells called chloronema apical cells. To understand the underlying molecular mechanisms, a digital gene expression profiling method using mRNA 5′-end tags (5′-DGE) was established. The 5′-DGE method produced reproducible data with a dynamic range of four orders that correlated well with qRT-PCR measurements. After the excision of leaves, the expression levels of 11% of the transcripts changed significantly within 6 h. Genes involved in stress responses and proteolysis were induced and those involved in metabolism, including photosynthesis, were reduced. The later processes of reprogramming involved photosynthesis recovery and higher macromolecule biosynthesis, including of RNA and proteins. Auxin and cytokinin signaling pathways, which are activated during stem cell formation via callus in flowering plants, are also activated during reprogramming in P. patens, although no exogenous phytohormone is applied in the moss system, suggesting that an intrinsic phytohormone regulatory system may be used in the moss
    corecore