203 research outputs found

    DBTSS: DataBase of Transcriptional Start Sites progress report in 2012

    Get PDF
    To support transcriptional regulation studies, we have constructed DBTSS (DataBase of Transcriptional Start Sites), which contains exact positions of transcriptional start sites (TSSs), determined with our own technique named TSS-seq, in the genomes of various species. In its latest version, DBTSS covers the data of the majority of human adult and embryonic tissues: it now contains 418 million TSS tag sequences from 28 tissues/cell cultures. Moreover, we integrated a series of our own transcriptomic data, such as the RNA-seq data of subcellular-fractionated RNAs as well as the ChIP-seq data of histone modifications and the binding of RNA polymerase II/several transcription factors in cultured cell lines into our original TSS information. We also included several external epigenomic data, such as the chromatin map of the ENCODE project. We further associated our TSS information with public or original single-nucleotide variation (SNV) data, in order to identify SNVs in the regulatory regions. These data can be browsed in our new viewer, which supports versatile search conditions of users. We believe that our new DBTSS will be an invaluable resource for interpreting the differential uses of TSSs and for identifying human genetic variations that are associated with disordered transcriptional regulation. DBTSS can be accessed at http://dbtss.hgc.jp

    DBTSS: database of transcription start sites, progress report 2008

    Get PDF
    DBTSS is a database of transcriptional start sites, based on our unique collection of precise, experimentally determined 5′-end sequences of full-length cDNAs. Since its first release in 2002, several major updates have been made. In this update, we expanded the human transcriptional start site dataset by 19 million uniquely mapped, and RefSeq-associated, 5′-end sequences, which were generated by a newly introduced Solexa sequencer. Moreover, in order to provide means for interpreting those massive TSS data, we implemented two new analytical tools: one for connecting expression information with predicted transcription factor binding sites; the other for examining evolutionary conservation or species-specificity of promoters and transcripts, which can be browsed by our own comparative genome viewer. With the expanded dataset and the enhanced functionalities, DBTSS provides a unique platform that enables in-depth transcriptome analyses. DBTSS is accessible at http://dbtss.hgc.jp/

    Conserved temporal ordering of promoter activation implicates common mechanisms governing the immediate early response across cell types and stimuli

    Get PDF
    Conserved temporal precedence between IEGs (light blue nodes) and other protein-coding genes (green nodes) is shown by directed edges. Genes annotated with the GO term 'response to endoplasmic reticulum stress' (GO:003497) have a red rectangle around the gene name; red squares indicate genes with CAGE clusters enriched for XBP1 transcription factor binding sites

    NONCODE v2.0: decoding the non-coding

    Get PDF
    The NONCODE database is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs). Since NONCODE was first released 3 years ago, the number of known ncRNAs has grown rapidly, and there is growing recognition that ncRNAs play important regulatory roles in most organisms. In the updated version of NONCODE (NONCODE v2.0), the number of collected ncRNAs has reached 206 226, including a wide range of microRNAs, Piwi-interacting RNAs and mRNA-like ncRNAs. The improvements brought to the database include not only new and updated ncRNA data sets, but also an incorporation of BLAST alignment search service and access through our custom UCSC Genome Browser. NONCODE can be found under http://www.noncode.org or http://noncode.bioinfo.org.cn

    Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species.</p> <p>Results</p> <p>We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ<sub>1</sub>-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries.</p> <p>Conclusion</p> <p>We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.</p

    Comprehensive characterisation of transcriptional activity during influenza A virus infection reveals biases in cap-snatching of host RNA sequences.

    Get PDF
    Macrophages in the lung detect and respond to influenza A virus (IAV), determining the nature of the immune response. Using terminal-depth cap analysis of gene expression (CAGE), we quantified transcriptional activity of both host and pathogen over a 24-h time course of IAV infection in primary human monocyte-derived macrophages (MDMs). This method allowed us to observe heterogenous host sequences incorporated into IAV mRNA, "snatched" 5' RNA caps, and corresponding RNA sequences from host RNAs. In order to determine whether capsnatching is random or exhibits a bias, we systematically compared host sequences incorporated into viral mRNA ("snatched") against a complete survey of all background host RNA in the same cells, at the same time. Using a computational strategy designed to eliminate sources of bias due to read length, sequencing depth, and multimapping, we were able to quantify overrepresentation of host RNA features among the sequences that were snatched by IAV. We demonstrate biased snatching of numerous host RNAs, particularly small nuclear RNAs (snRNAs), and avoidance of host transcripts encoding host ribosomal proteins, which are required by IAV for replication. We then used a systems approach to describe the transcriptional landscape of the host response to IAV, observing many new features, including a failure of IAV-treated MDMs to induce feedback inhibitors of inflammation, seen in response to other treatments.IMPORTANCE Infection with influenza A virus (IAV) infection is responsible for an estimated 500,000 deaths and up to 5 million cases of severe respiratory illness each year. In this study, we looked at human primary immune cells (macrophages) infected with IAV. Our method allows us to look at both the host and the virus in parallel. We used these data to explore a process known as "cap-snatching," where IAV snatches a short nucleotide sequence from capped host RNA. This process was believed to be random. We demonstrate biased snatching of numerous host RNAs, including those associated with snRNA transcription, and avoidance of host transcripts encoding host ribosomal proteins, which are required by IAV for replication. We then describe the transcriptional landscape of the host response to IAV, observing new features, including a failure of IAV-treated MDMs to induce feedback inhibitors of inflammation, seen in response to other treatments

    The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs

    Get PDF
    We developed a pair of databases that support two important tasks: annotation of anonymous RNA transcripts and discovery of novel non-coding RNAs. The database combo is called the Functional RNA Database and consists of two databases: a rewrite of the original version of the Functional RNA Database (fRNAdb) and the latest version of the UCSC GenomeBrowser for Functional RNA. The former is a sequence database equipped with a powerful search function and hosts a large collection of known/predicted non-coding RNA sequences acquired from existing databases as well as novel/predicted sequences reported by researchers of the Functional RNA Project. The latter is a UCSC Genome Browser mirror with large additional custom tracks specifically associated with non-coding elements. It also includes several functional enhancements such as a presentation of a common secondary structure prediction at any given genomic window ⩽500 bp. Our GenomeBrowser supports user authentication and user-specific tracks. The current version of the fRNAdb is a complete rewrite of the former version, hosting a larger number of sequences and with a much friendlier interface. The current version of UCSC GenomeBrowser for Functional RNA features a larger number of tracks and richer features than the former version. The databases are available at http://www.ncrna.org/

    The UniTrap resource: tools for the biologist enabling optimized use of gene trap clones

    Get PDF
    We have developed a comprehensive resource devoted to biologists wanting to optimize the use of gene trap clones in their experiments. We have processed 300 602 such clones from both public and private projects to generate 28 199 ‘UniTraps’, i.e. distinct collections of unambiguous insertions at the same subgenic region of annotated genes. The UniTrap resource contains data relative to 9583 trapped genes, which represent 42.3% of the mouse gene content. Among the trapped genes, 7 728 have a counterpart in humans, and 677 are known to be involved in the pathogenesis of human diseases. The aim of this analysis is to provide the wet lab researchers with a comprehensive database and curated tools for (i) identifying and comparing the clones carrying a trap into the genes of interest, (ii) evaluating the severity of the mutation to the protein function in each independent trapping event and (iii) supplying complete information to perform PCR, RT-PCR and restriction experiments to verify the clone and identify the exact point of vector insertion. To share this unique resource with the scientific community, we have designed and implemented a web interface that is freely accessible at http://unitrap.cbm.fvg.it/

    Characterization of Transcription Start Sites of Putative Non-coding RNAs by Multifaceted Use of Massively Paralleled Sequencer

    Get PDF
    On the basis of integrated transcriptome analysis, we show that not all transcriptional start site clusters (TSCs) in the intergenic regions (iTSCs) have the same properties; thus, it is possible to discriminate the iTSCs that are likely to have biological relevance from the other noise-level iTSCs. We used a total of 251 933 381 short-read sequence tags generated from various types of transcriptome analyses in order to characterize 6039 iTSCs, which have significant expression levels. We analyzed and found that 23% of these iTSCs were located in the proximal regions of the RefSeq genes. These RefSeq-linked iTSCs showed similar expression patterns with the neighboring RefSeq genes, had widely fluctuating transcription start sites and lacked ordered nucleosome positioning. These iTSCs seemed not to form independent transcriptional units, simply representing the by-products of the neighboring RefSeq genes, in spite of their significant expression levels. Similar features were also observed for the TSCs located in the antisense regions of the RefSeq genes. Furthermore, for the remaining iTSCs that were not associated with any RefSeq genes, we demonstrate that integrative interpretation of the transcriptome data provides essential information to specify their biological functions in the hypoxic responses of the cells

    Chemical synthesis of a very long oligoribonucleotide with 2-cyanoethoxymethyl (CEM) as the 2′-O-protecting group: structural identification and biological activity of a synthetic 110mer precursor-microRNA candidate

    Get PDF
    A long RNA oligomer, a 110mer with the sequence of a precursor-microRNA candidate, has been chemically synthesized in a single synthesizer run by means of standard automated phosphoramidite chemistry. The synthetic method involved the use of 2-cyanoethoxymethyl (CEM), a 2′-hydroxyl protecting group recently developed in our laboratory. We improved the methodology, introducing better coupling and capping conditions. The overall isolated yield of highly pure 110mer was 5.5%. Such a yield on a 1-μmol scale corresponds to 1 mg of product and emphasizes the practicality of the CEM method for synthesizing oligomers of more than 100 nt in sufficient quantity for biological research. We confirmed the identity of the 110mer by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry, as well as HPLC, electrophoretic methods, and RNase-digestion experiments. The 110mer also showed sense-selective specific gene-silencing activity. As far as we know, this is the longest chemically synthesized RNA oligomer reported to date. Furthermore, the identity of the 110mer was confirmed by both physicochemical and biological methods
    corecore