Search CORE

268 research outputs found

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Author: FANTOM Consortium
Greco Dario
Kere Juha
Nguyen Quan Hoang
Publication venue
Publication date: 01/12/2021
Field of study

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

DBTSS: database of transcription start sites, progress report 2008

Author: Bentley
H. Wakaguri
K. Nakai
Lamb
Matys
Prabhakar
R. Yamashita
S. Sugano
Suzuki
The FANTOM Consortium
Y. Suzuki
Yamashita
Publication venue: Oxford University Press
Publication date: 01/01/2007
Field of study

DBTSS is a database of transcriptional start sites, based on our unique collection of precise, experimentally determined 5′-end sequences of full-length cDNAs. Since its first release in 2002, several major updates have been made. In this update, we expanded the human transcriptional start site dataset by 19 million uniquely mapped, and RefSeq-associated, 5′-end sequences, which were generated by a newly introduced Solexa sequencer. Moreover, in order to provide means for interpreting those massive TSS data, we implemented two new analytical tools: one for connecting expression information with predicted transcription factor binding sites; the other for examining evolutionary conservation or species-specificity of promoters and transcripts, which can be browsed by our own comparative genome viewer. With the expanded dataset and the enhanced functionalities, DBTSS provides a unique platform that enables in-depth transcriptome analyses. DBTSS is accessible at http://dbtss.hgc.jp/

CiteSeerX

Crossref

PubMed Central

DBTSS: DataBase of Transcriptional Start Sites progress report in 2012

Author: Barski
Ernst
K. Nakai
Mardis
Mills
R. Yamashita
S. Sugano
Sudmant
Suzuki
The FANTOM Consortium
Y. Suzuki
Publication venue: Oxford University Press
Publication date
Field of study

To support transcriptional regulation studies, we have constructed DBTSS (DataBase of Transcriptional Start Sites), which contains exact positions of transcriptional start sites (TSSs), determined with our own technique named TSS-seq, in the genomes of various species. In its latest version, DBTSS covers the data of the majority of human adult and embryonic tissues: it now contains 418 million TSS tag sequences from 28 tissues/cell cultures. Moreover, we integrated a series of our own transcriptomic data, such as the RNA-seq data of subcellular-fractionated RNAs as well as the ChIP-seq data of histone modifications and the binding of RNA polymerase II/several transcription factors in cultured cell lines into our original TSS information. We also included several external epigenomic data, such as the chromatin map of the ENCODE project. We further associated our TSS information with public or original single-nucleotide variation (SNV) data, in order to identify SNVs in the regulatory regions. These data can be browsed in our new viewer, which supports versatile search conditions of users. We believe that our new DBTSS will be an invaluable resource for interpreting the differential uses of TSSs and for identifying human genetic variations that are associated with disordered transcriptional regulation. DBTSS can be accessed at http://dbtss.hgc.jp

Crossref

PubMed Central

Conserved temporal ordering of promoter activation implicates common mechanisms governing the immediate early response across cell types and stimuli

Author: Aitken James
Arner Erik
Carninci Piero
Daub Carsten
FANTOM consortium The
Forrest Alistair R. R.
Hayashizaki Yosihide
Itoh Masayoshi
Kawaji Hideya
Lassmann Timo
Semple Colin
Vacca Annalaura
Publication venue: 'The Royal Society'
Publication date: 16/07/2018
Field of study

Conserved temporal precedence between IEGs (light blue nodes) and other protein-coding genes (green nodes) is shown by directed edges. Genes annotated with the GO term 'response to endoplasmic reticulum stress' (GO:003497) have a red rectangle around the gene name; red squares indicate genes with CAGE clusters enriched for XBP1 transcription factor binding sites

Crossref

Edinburgh Research Explorer

FigShare

DDBJ launches a new archive database with analytical tools for next-generation sequence data

Author: Cochrane
Eli Kaminuma
FANTOM Consortium
Fumoto
Hongoh
Ikeo
Jun Mashima
Kawabata
Kosuge
Kousaku Okubo
Osamu Ogasawara
Parkinson
Sato
Sugawara
Takashi Gojobori
Toshihisa Takagi
Winer
Yasukazu Nakamura
Yuichi Kodama
Publication venue: Oxford University Press
Publication date
Field of study

The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the ‘DDBJ Read Archive’ (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the ‘DDBJ Read Annotation Pipeline’ was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users’ research and provide easier access to DDBJ databases

Crossref

PubMed Central

CCL2 enhances pluripotency of human induced pluripotent stem cells by activating hypoxia related genes

Author: FANTOM Consortium
Forrest Alistair R. R.
Hasegawa Yuki
Hayashizaki Yoshihide
Sajantila Antti
Suzuki Harukazu
Takahashi Naoko
Tang Dave
Publication venue
Publication date: 01/01/2014
Field of study

A. Sajantila työryhmän FANTOM Consortium jäsen. Jäseniä yht. 261.Peer reviewe

Crossref

Cold Spring Harbor Laboratory Institutional Repository

edoc

PubMed Central

Edinburgh Research Explorer

Leiden University Scholary Publications

Helsingin yliopiston digitaalinen arkisto

White Rose Research Online

University of Melbourne Institutional Repository

Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries

Author: A Mironov
BS Everitt
C Southan
Corinne Dahinden
D Brett
D Brett
F Liang
Giovanni Parmigiani
International Human Genome Sequencing Consortium
International Human Genome Sequencing Consortium
M Yuan
M Zavolan
Mark C Emerick
MR Regan
Peter Bühlmann
R Christensen
R Tibshirani
S Rosset
SL Lauritzen
T Imanishi
The FANTOM Consortium
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species. Results We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries. Conclusion We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.</p

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NONCODE v2.0: decoding the non-coding

Author: Aravin
B. Bai
Benson
C. Liu
G. Skogerbo
Girard
H. Zhao
Huang
J. Wang
Mattick
R. Chen
Rivas
S. He
T. Liu
The FANTOM Consortium
Y. Zhao
Zemann
Publication venue: Oxford University Press
Publication date
Field of study

The NONCODE database is an integrated knowledge database designed for the analysis of non-coding RNAs (ncRNAs). Since NONCODE was first released 3 years ago, the number of known ncRNAs has grown rapidly, and there is growing recognition that ncRNAs play important regulatory roles in most organisms. In the updated version of NONCODE (NONCODE v2.0), the number of collected ncRNAs has reached 206 226, including a wide range of microRNAs, Piwi-interacting RNAs and mRNA-like ncRNAs. The improvements brought to the database include not only new and updated ncRNA data sets, but also an incorporation of BLAST alignment search service and access through our custom UCSC Genome Browser. NONCODE can be found under http://www.noncode.org or http://noncode.bioinfo.org.cn

Crossref

PubMed Central

Comprehensive characterisation of transcriptional activity during influenza A virus infection reveals biases in cap-snatching of host RNA sequences.

Author: Baillie Kenneth
Bertin Nicolas
Carninci Piero
Clohisey Sara
Digard Paul
FANTOM consortium The
Forrest Alistair A.
Hayashizaki Yoshihide
Hendry Ross W.
Hume David
Parkinson Nicholas
Summers Kim M
Tomoiu Andru
Wang Bo
Wise Helen
Publication venue: 'American Society for Microbiology'
Publication date: 11/03/2020
Field of study

Macrophages in the lung detect and respond to influenza A virus (IAV), determining the nature of the immune response. Using terminal-depth cap analysis of gene expression (CAGE), we quantified transcriptional activity of both host and pathogen over a 24-h time course of IAV infection in primary human monocyte-derived macrophages (MDMs). This method allowed us to observe heterogenous host sequences incorporated into IAV mRNA, "snatched" 5' RNA caps, and corresponding RNA sequences from host RNAs. In order to determine whether capsnatching is random or exhibits a bias, we systematically compared host sequences incorporated into viral mRNA ("snatched") against a complete survey of all background host RNA in the same cells, at the same time. Using a computational strategy designed to eliminate sources of bias due to read length, sequencing depth, and multimapping, we were able to quantify overrepresentation of host RNA features among the sequences that were snatched by IAV. We demonstrate biased snatching of numerous host RNAs, particularly small nuclear RNAs (snRNAs), and avoidance of host transcripts encoding host ribosomal proteins, which are required by IAV for replication. We then used a systems approach to describe the transcriptional landscape of the host response to IAV, observing many new features, including a failure of IAV-treated MDMs to induce feedback inhibitors of inflammation, seen in response to other treatments.IMPORTANCE Infection with influenza A virus (IAV) infection is responsible for an estimated 500,000 deaths and up to 5 million cases of severe respiratory illness each year. In this study, we looked at human primary immune cells (macrophages) infected with IAV. Our method allows us to look at both the host and the virus in parallel. We used these data to explore a process known as "cap-snatching," where IAV snatches a short nucleotide sequence from capped host RNA. This process was believed to be random. We demonstrate biased snatching of numerous host RNAs, including those associated with snRNA transcription, and avoidance of host transcripts encoding host ribosomal proteins, which are required by IAV for replication. We then describe the transcriptional landscape of the host response to IAV, observing new features, including a failure of IAV-treated MDMs to induce feedback inhibitors of inflammation, seen in response to other treatments

Crossref

Edinburgh Research Explorer

University of Queensland eSpace

The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs

Author: A. Yoshizawa
Altschul
Czech
E. Hattori
G. Terai
Griffiths-Jones
H. Okida
Inagaki
K. Asai
K. Yamada
Kawamura
Landgraf
Lestrade
Okamura
Sasaki
T. Komori
T. Mituyama
The FANTOM Consortium
Y. Ono
Publication venue: Oxford University Press
Publication date
Field of study

We developed a pair of databases that support two important tasks: annotation of anonymous RNA transcripts and discovery of novel non-coding RNAs. The database combo is called the Functional RNA Database and consists of two databases: a rewrite of the original version of the Functional RNA Database (fRNAdb) and the latest version of the UCSC GenomeBrowser for Functional RNA. The former is a sequence database equipped with a powerful search function and hosts a large collection of known/predicted non-coding RNA sequences acquired from existing databases as well as novel/predicted sequences reported by researchers of the Functional RNA Project. The latter is a UCSC Genome Browser mirror with large additional custom tracks specifically associated with non-coding elements. It also includes several functional enhancements such as a presentation of a common secondary structure prediction at any given genomic window ⩽500 bp. Our GenomeBrowser supports user authentication and user-specific tracks. The current version of the fRNAdb is a complete rewrite of the former version, hosting a larger number of sequences and with a much friendlier interface. The current version of UCSC GenomeBrowser for Functional RNA features a larger number of tracks and richer features than the former version. The databases are available at http://www.ncrna.org/

Crossref

PubMed Central