Search CORE

6,049 research outputs found

RNA secondary structure prediction from multi-aligned sequences

It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets

Author: Deshpande Sumukh
England Matthew
Shuttleworth James
Taramonli Sandy
Yang Jianhua
Publication venue: 'Elsevier BV'
Publication date: 01/02/2019
Field of study

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncScore, CPAT are primarily designed for prediction of lncRNAs based on the GENCODE, NONCODE and CANTATAdb databases. The prediction accuracy of these tools often drops when tested on transcriptomic datasets. This leads to higher false positive results and inaccuracy in the function annotation process. In this study, we present a novel tool, PLIT, for the identification of lncRNAs in plants RNA-seq datasets. PLIT implements a feature selection method based on L1 regularization and iterative Random Forests (iRF) classification for selection of optimal features. Based on sequence and codon-bias features, it classifies the RNA-seq derived FASTA sequences into coding or long non-coding transcripts. Using L1 regularization, 31 optimal features were obtained based on lncRNA and protein-coding transcripts from 8 plant species. The performance of the tool was evaluated on 7 plant RNA-seq datasets using 10-fold cross-validation. The analysis exhibited superior accuracy when evaluated against currently available state-of-the-art CPC tools

arXiv.org e-Print Archive

Online Research @ Cardiff

Coventry University Pure Portal

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Author: Arvestad
Blanchette
Brent
Butler
Clark
Goldman
Guttman
Guttman
Holmes
I. Jungreis
Kellis
Lin
M. F. Lin
M. Kellis
Ota
Ozsolak
Stark
Whelan
Yang
Publication venue
Publication date: 17/08/2010
Field of study

As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

Evolutionary Modeling and Prediction of Non-Coding RNAs in Drosophila

Author: A Siepel
A Siepel
A Stark
A Varadarajan
AG Clark
Andrew V. Uzilov
B Knudsen
B Paten
CN Dewey
D Rose
D St Johnston
DP Bartel
DS Parker
E Boyle
E Lcuyer
E Nawrocki
E Rivas
E Rivas
E Torarinsson
G McGuire
Ian Holmes
IL Hofacker
J Brennecke
J Pedersen
J Ruby
JL Thorne
JP Bachellerie
JR Manak
JS Pedersen
JS Pedersen
JS Pedersen
KS Pollard
Lars Barquist
M Crosby
M Mandal
M Pheasant
M Sprinzl
Mitchell E. Skinner
N Bray
N Goldman
PD Rijk
PS Klosterman
RD Dowell
RD Dowell
RK Bradley
Robert Belshaw
Robert K. Bradley
S Griffiths-Jones
S Washietl
T Babak
T Elgavish
T Gesell
TM Lowe
V Ambros
WJ Bruno
YR Bendana
Yuri R. Bendaña
Z Wang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Computational analysis of noncoding RNAs

Author: Akutsu
Alkan
Amaral
Amaral
Ambros
Anders
Andronescu
Andronescu
Andronescu
Aravin
Au
Bartel
Bartel
Bateman
Bentwich
Berezikov
Bernhart
Bon
Bon
Bonnet
Breaker
Busch
Cabili
Chen
Chen
Chen
Clark
Coventry
Deigan
del Val
Ding
Dinger
Dirks
Do
Do
Dowell
ENCODE Project Consortium
Findei
Flamm
Freyhult
Frhlich
Friedländer
Frith
Galperin
Gardner
Gardner
Gautheret
Grad
Griffith
Grishok
Gruber
Guttman
Guttman
Harmanci
Havgaard
He
Hendrix
Hertel
Hertel
Hofacker
Hofacker
Hofacker
Jiang
Katz
Kertesz
Knudsen
Kozomara
Kruger
Lagesen
Lai
Laing
Laslett
Lau
Li
Lim
Lin
Lorenz
Lowe
Lowe
Lu
Lu
Lucks
Lyngso
Macke
Markham
Mathelier
Mathews
Mathews
Mathews
Mattick
McCaskill
Menzel
Mituyama
Mohl
Mortazavi
Mourier
Mückstein
Nagel
Nam
Nawrocki
Noller
Nussinov
Ohler
Pang
Parisien
Pasquinelli
Pedersen
Pervouchine
Pfeffer
Reeder
Regalia
Ren
Reuter
Rivas
Rivas
Rivas
Rivas
Roberts
Robertson
Robinson
Ruan
Ruby
Salari
Sankoff
Sato
Schattner
Schnall-Levin
Schnall-Levin
Seemann
Seemann
Seetin
Seitz
Sethupathy
Shi
Sperschneider
Stark
Stark
Tabaska
Tang
Torarinsson
Trapnell
Trapnell
Uemura
Underwood
van Bakel
Wang
Wang
Washietl
Washietl
Washietl
Washietl
Washietl
Washietl
Washietl
Weeks
Weinberg
Weinberg
Will
Will
Will
Wolfinger
Wu
Wuchty
Xayaphoummine
Xia
Xie
Xue
Yao
Zerbino
zu Siederdissen
Zuker
Zuker
Publication venue: 'Wiley'
Publication date: 01/11/2012
Field of study

Noncoding RNAs have emerged as important key players in the cell. Understanding their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze noncoding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources.Austrian Science Fund (Schrodinger Fellowship J2966-B12)German Research Foundation (grant WI 3628/1-1 to SW)National Institutes of Health (U.S.) (NIH award 1RC1CA147187

DSpace@MIT

Crossref

PubMed Central

Dinucleotide controlled null models for comparative RNA gene prediction

Author: A Coventry
A Rambaut
A Siepel
AM Pedersen
AV Uzilov
C del Val
C Lanave
C Weile
C Workman
D Karolchik
D Metzler
D Rose
DM Robinson
DR Forsdyke
E Rivas
E Torarinsson
G Lunter
I Miklós
IL Hofacker
J Felsenstein
J Jensen
J Thorne
J Thorne
K Missal
K Missal
L Duret
M Blanchette
M Hasegawa
M Schöniger
M Schöniger
O Gascuel
OF Christensen
P Clote
PF Arndt
R Backofen
R Fleißner
S Griffiths-Jones
S Griffiths-Jones
S Guindon
S Tavaré
S Washietl
S Washietl
S Washietl
S Washietl
S Washietl
SF Altschul
Stefan Washietl
T Babak
T Gesell
T Mourier
T Sandmann
Tanja Gesell
YVan de Peer
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. Availability SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach

Author: A Wilm
A Wilm
AO Harmanci
AS Schwartz
B Paten
Byung-Jun Yoon
C Do
C Notredame
CB Do
CB Do
CB Do
D Dalli
D Sankoff
DH Mathews
DH Mathews
FF Costa
G Storz
H Kiryu
H Kiryu
I Holmes
IL Hofacker
IL Hofacker
IL Hofacker
J Gorodkin
JH Havgaard
JH Havgaard
JS McCaskill
K Katoh
M Anwar
M Bauer
M Hamada
M Hamada
R Durbin
RD Dowell
RK Bradley
RK Bradley
S Griffiths-Jones
S Lindgreen
S Moretti
S Siebert
S Wang
S Washietl
S Will
Sayed Mohammad Ebrahim Sahraeian
SM Sahraeian
SR Eddy
U Roshan
X Xu
Y Tabei
ZJ Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: <url>http://www.ece.tamu.edu/~bjyoon/picxaa/</url>.</p

Crossref

Directory of Open Access Journals

PubMed Central

Texas A&M Repository