Search CORE

nocoRNAc: Characterization of non-coding RNAs in prokaryotes

Author: A Busch
A Hüttenhofer
A Muffler
A Rodríguez-García
A Sittka
A Zhang
AC Darling
Alexander Herbig
AR Gruber
AV Uzilov
B Tjaden
B Voss
B Xiao
C Barrandon
C Pichon
C Pichon
CJ Benham
CM Sharma
CM Sharma
D D'Alia
D Gautheret
DD Sledjeski
E Rivas
EP Nawrocki
F Battke
F Battke
F Repoila
G Storz
H Wang
H Wang
HH Tseng
I Irnov
J Bode
J Livny
J Pánek
J Schlüter
J Vogel
JP Swiercz
JS Pedersen
K Nieselt
Kay Nieselt
LF Abu-Qatouseh
M Albrecht
M Giangrossi
N Yachie
P Saetrom
R Development Core Team
R Gentleman
S Altuvia
S Brantl
S Washietl
SD Bentley
SR Eddy
T Geissmann
TM Lowe
TT Tran
X Wang
Z Polonskaya
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not. Results We present <smcaps>NOCO</smcaps>RNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. <smcaps>NOCO</smcaps>RNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and <smcaps>NOCO</smcaps>RNAc to the genome of <it>Streptomyces coelicolor </it>and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner. Conclusions We have developed <smcaps>NOCO</smcaps>RNAc, a framework that facilitates the automated characterization of functional ncRNAs. <smcaps>NOCO</smcaps>RNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. <smcaps>NOCO</smcaps>RNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at <url>http://www.zbit.uni-tuebingen.de/pas/nocornac.htm</url>.</p

Springer - Publisher Connector

Springer - Publisher Connector

MPG.PuRe

Robust and accurate prediction of noncoding RNAs from aligned sequences

Author: Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: BioMed Central
Publication date: 15/10/2010
Field of study

Recommended from our members

Detection of RNA structures in porcine EST data and related mammals

Author: Gilchrist Michael J
Gorodkin Jan
Hofacker Ivo L
Seemann Stefan E
Stadler Peter F
Publication venue
Publication date: 16/06/2011
Field of study

RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource http://pigest.ku.dk which also contains expression information distributed on 97 non-normalized cDNA libraries. Results We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance. Conclusion Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs.Published versio

Apollo (Cambridge)

Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change

Author: Keegan Joshua M
Mathews David H
Uzilov Andrew V
Publication venue: BioMed Central
Publication date: 01/03/2006
Field of study

BACKGROUND: Non-coding RNAs (ncRNAs) have a multitude of roles in the cell, many of which remain to be discovered. However, it is difficult to detect novel ncRNAs in biochemical screens. To advance biological knowledge, computational methods that can accurately detect ncRNAs in sequenced genomes are therefore desirable. The increasing number of genomic sequences provides a rich dataset for computational comparative sequence analysis and detection of novel ncRNAs. RESULTS: Here, Dynalign, a program for predicting secondary structures common to two RNA sequences on the basis of minimizing folding free energy change, is utilized as a computational ncRNA detection tool. The Dynalign-computed optimal total free energy change, which scores the structural alignment and the free energy change of folding into a common structure for two RNA sequences, is shown to be an effective measure for distinguishing ncRNA from randomized sequences. To make the classification as a ncRNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine. The latter method is much faster, but slightly less sensitive at a given specificity. Additionally, the classification support vector machine method is shown to be sensitive and specific on genomic ncRNA screens of two different Escherichia coli and Salmonella typhi genome alignments, in which many ncRNAs are known. The Dynalign computational experiments are also compared with two other ncRNA detection programs, RNAz and QRNA. CONCLUSION: The Dynalign-based support vector machine method is more sensitive for known ncRNAs in the test genomic screens than RNAz and QRNA. Additionally, both Dynalign-based methods are more sensitive than RNAz and QRNA at low sequence pair identities. Dynalign can be used as a comparable or more accurate tool than RNAz or QRNA in genomic screens, especially for low-identity regions. Dynalign provides a method for discovering ncRNAs in sequenced genomes that other methods may not identify. Significant improvements in Dynalign runtime have also been achieved

Dinucleotide controlled null models for comparative RNA gene prediction

Author: A Coventry
A Rambaut
A Siepel
AM Pedersen
AV Uzilov
C del Val
C Lanave
C Weile
C Workman
D Karolchik
D Metzler
D Rose
DM Robinson
DR Forsdyke
E Rivas
E Torarinsson
G Lunter
I Miklós
IL Hofacker
J Felsenstein
J Jensen
J Thorne
J Thorne
K Missal
K Missal
L Duret
M Blanchette
M Hasegawa
M Schöniger
M Schöniger
O Gascuel
OF Christensen
P Clote
PF Arndt
R Backofen
R Fleißner
S Griffiths-Jones
S Griffiths-Jones
S Guindon
S Tavaré
S Washietl
S Washietl
S Washietl
S Washietl
S Washietl
SF Altschul
Stefan Washietl
T Babak
T Gesell
T Mourier
T Sandmann
Tanja Gesell
YVan de Peer
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. Availability SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p

Springer - Publisher Connector

Characterization of the human ESC transcriptome by hybrid sequencing

Author: Afshar Pegah Tootoonchi
Au Kin Fai
Durruthy Jens Durruthy
Lee Lawrence
Reijo-Pera Renee A.
Schadt Eric E.
Sebastiano Vittorio
Underwood Jason G.
van Bakel Harm
Williams Brian A.
Wong Wing Hung
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 10/12/2013
Field of study

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, fulllength mRNA isoforms are not captured. On the other hand, thirdgeneration sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency- associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete

Caltech Authors

From Structure Prediction to Genomic Screens for Novel Non-Coding RNAs

Author: A Ben-Hur
AF Bompfünewerer
AM Khalil
AO Harmanci
AR Gruber
AV Uzilov
AX Wang
B Knudsen
B Lewis
BW Matthews
C Warden
C Workman
D Guarnieri
D Mathews
D Sankoff
DH Mathews
DH Turner
DK Chiu
E Bonnet
E Nudler
E Rivas
E Rivas
E Rivas
E Torarinsson
E Torarinsson
EP Nawrocki
EP Nawrocki
ES Andersen
ES Andersen
F Sleutels
GardnerJPP Daub
H Jia
I Holmes
I Holmes
IL Hofacker
Ivo L. Hofacker
J Felsenstein
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
Jan Gorodkin
JC Ellis
JG Underwood
JH Havgaard
JM Watts
JP McCutcheon
JS Mattick
JS Pedersen
JW Brown
K Doshi
K Okamura
K Reiche
KC Wang
KE Deigan
KM Weeks
L Redrup
M Georges
M Guttman
M Kertesz
M Kertesz
M Lindow
M Xie
MB Gerstein
MC Tsai
Michael Levitt
MW Hentze
N Lau
P Anandam
P Clote
P Gardner
P Larsson
P Menzel
P Schattner
PG Hawkins
PN Seibel
PP Gardner
R Nussinov
RA Gupta
RD Dowell
RD Dowell
RJ Klein
RJ Klein
RM Kuhn
RR Gutell
RR Gutell
S Eddy
S Griffiths-Jones
S Siebert
S Washietl
S Washietl
S Washietl
S Will
SE Seemann
SF Altschul
SR Eddy
T Gesell
T Hung
T Lowe
T Nagano
TF Consortium
TJ Macke
UA Ørom
V Kim
V Tripathi
W Deng
W Filipowicz
W Fontana
Y Park
Y Sakakibara
Z Weinberg
Z Weinberg
Z Yao
Z Yao
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other