Search CORE

366 research outputs found

RNA secondary structure prediction from multi-aligned sequences

It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Improved Measurements of RNA Structure Conservation with Generalized Centroid Estimators

Author: Okada Yohei
Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the most stable structure, MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, instead of MFE itself, several alternative measures for identifying ncRNAs have been proposed such as the structure conservation index (SCI) and the base pair distance (BPD), both of which employ MFE structures. However, these measurements are unfortunately not suitable for identifying ncRNAs in some cases including the genome-wide search and incur high false discovery rate. In this study, we propose improved measurements based on SCI and BPD, applying generalized centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that our proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the centroid-based SCI on CLUSTAL W alignments is more accurate than or comparable with that of the original SCI on structural alignments generated with RAF, a high quality structural aligner, for which twofold expensive computational time is required on average. We conclude that our methods are more suitable for genome-wide alignments which are of low quality from the point of view on secondary structures than the original SCI and BPD

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

Author: B Knudsen
C Do
D Mathews
H Kiryu
I Hofacker
I Holmes
IL Hofacker
JS McCaskill
K Sato
Kengo Sato
Kiyoshi Asai
L Carvalho
L Kall
M Andronescu
M Andronescu
M Hamada
M Hamada
M Hamada
M Hamada
M Parisien
M Zuker
M Zuker
MC Frith
Michiaki Hamada
N Michal
P Baldi
PP Gardner
R Durbin
RK Bradley
RK Bradley
S Bernhart
S Engelen
S Griffiths-Jones
S Gross
S Seemann
SJ Schroeder
Y Ding
Y Ding
Y Ding
ZJ Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence. Results Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the <it>pseudo</it>-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator. Conclusions This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

RNAG: a new Gibbs sampler for predicting RNA secondary structure for unaligned sequences

Author: Bernhart
Bindewald
Carvalho
Cary
Charles E. Lawrence
Chenna
Ding
Ding
Do
Do
Do
Donglai Wei
Eddy
Gardner
Geman
Giegerich
Griffiths-Jones
Gutell
Hamada
Hamada
Hofacker
Hofacker
Ji
Kiryu
Kiryu
Knudsen
Lauren V. Alpert
Lindgreen
Liu
Mathews
Mathews
Meyer
Nawrocki
Nawrocki
Newberg
Sakakibara
Sankoff
Seemann
Siebert
Steffen
Tabaska
Torarinsson
Webb
Webb-Robertson
Will
Xing
Yao
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: RNA secondary structure plays an important role in the function of many RNAs, and structural features are often key to their interaction with other cellular components. Thus, there has been considerable interest in the prediction of secondary structures for RNA families. In this article, we present a new global structural alignment algorithm, RNAG, to predict consensus secondary structures for unaligned sequences. It uses a blocked Gibbs sampling algorithm, which has a theoretical advantage in convergence time. This algorithm iteratively samples from the conditional probability distributions P(Structure | Alignment) and P(Alignment | Structure). Not surprisingly, there is considerable uncertainly in the high-dimensional space of this difficult problem, which has so far received limited attention in this field. We show how the samples drawn from this algorithm can be used to more fully characterize the posterior space and to assess the uncertainty of predictions

CiteSeerX

Crossref

PubMed Central

CentroidFold: a web server for RNA secondary structure prediction

Author: Ding
DING
Do
Dowell
Hofacker
K. Asai
K. Sato
Knudsen
M. Hamada
McCaskill
T. Mituyama
Zuker
Publication venue: Oxford University Press
Publication date
Field of study

The CentroidFold web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the ‘execute CentroidFold’ button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CentroidFold software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement

Crossref

PubMed Central

Predictions of RNA secondary structure by combining homologous sequence information

Author: Andronescu
Andronescu
Bernhart
Bradley
Carvalho
Ding
Do
Do
Do
Do
Dowell
Durbin
Fariselli
Griffiths-Jones
Hamada
Hamada
Hisanori Kiryu
Hofacker
Hofacker
Holmes
Kengo Sato
Kiryu
Kiryu
Kiyoshi Asai
Lunter
McCaskill
Michiaki Hamada
Miyazawa
Nussinov
Parisien
Paten
Roshan
Sankoff
Seemann
Tabei
Toutai Mituyama
Wong
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Secondary structure prediction of RNA sequences is an important problem. There have been progresses in this area, but the accuracy of prediction from an RNA sequence is still limited. In many cases, however, homologous RNA sequences are available with the target RNA sequence whose secondary structure is to be predicted

Crossref

PubMed Central

Improving the accuracy of predicting secondary structure for aligned RNA sequences

Author: Andronescu
Bernhart
Bernhart
Bradley
Bradley
Carvalho
Clyde
Ding
Do
Do
Durbin
Frith
Gardner
Griffiths-Jones
Gross
Hamada
Hamada
Hamada
Hofacker
Hofacker
Holmes
Jochl
Kall
Kato
Katoh
Kengo Sato
Kiryu
Kiyoshi Asai
Knudsen
Lu
McCaskill
Michal
Michiaki Hamada
Newberg
Nussinov
Okada
Sahraeian
Sato
Schroeder
Seemann
Stocsits
Tabei
Thompson
Thurner
Washietl
Washietl
Webb-Robertson
Zuker
Publication venue: Oxford University Press
Publication date
Field of study

Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms

Crossref

PubMed Central

Bayesian Centroid Estimation for Motif Discovery

Author: A Dempster
A Neuwald
B Webb-Robertson
C Lawrence
C Lawrence
C Murrea
D GuhaThakurta
E Xing
F Roth
G Pavesi
G Sandve
G Stormo
G Thijs
J Besag
J Gower
J Hu
J Liu
K MacIsaac
L Carvalho
L Newberg
Luis Carvalho
M Barbieri
M Régnier
M Tompa
MA Lones
Matteo G. A. Paris
S Geman
T Bailey
W Thompson
Y Ding
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2012
Field of study

Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the maximum a posteriori estimator.Comment: 24 pages, 9 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Faster computation of exact RNA shape probabilities

Author: Andronescu
Berezikov
Bernhart
Brejová
Carvalho
Chan
Clote
Ding
Do
Doshi
Durbin
Giegerich
Giegerich
Griffiths-Jones
Hamada
Havgaard
Hofacker
Janssen
Lorenz
Lu
Mandal
Mathews
Mathews
McCaskill
Meyer
Nebel
Reeder
Reeder
Robert Giegerich
Stefan Janssen
Steffen
Voß
Waldminghaus
Wuchty
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Abstract shape analysis allows efficient computation of a representative sample of low-energy foldings of an RNA molecule. More comprehensive information is obtained by computing shape probabilities, accumulating the Boltzmann probabilities of all structures within each abstract shape. Such information is superior to free energies because it is independent of sequence length and base composition. However, up to this point, computation of shape probabilities evaluates all shapes simultaneously and comes with a computation cost which is exponential in the length of the sequence

Crossref

PubMed Central

Publications at Bielefeld University