Abstract Background Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence. Results Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the <it>pseudo</it>-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator. Conclusions This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p

B Knudsen

C Do

D Mathews

H Kiryu

I Hofacker

I Holmes

IL Hofacker

JS McCaskill

K Sato

Kengo Sato

Kiyoshi Asai

L Carvalho

L Kall

M Andronescu

M Hamada

M Parisien

M Zuker

MC Frith

Michiaki Hamada

N Michal

P Baldi

PP Gardner

R Durbin

RK Bradley

S Bernhart

S Engelen

S Griffiths-Jones

S Gross

S Seemann

SJ Schroeder

Y Ding

ZJ Lu

English

PubMed

Crossref

Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

Springer - Publisher Connector

Abstract Background Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence. Results Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the pseudo-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator. Conclusions This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p

Sato Kengo

Hamada Michiaki

Asai Kiyoshi

Directory of Open Access Journals

BMC Bioinformatics

A: Rfam: annotating non-coding RNAs in complete genomes.

A: Rfam: updates to the RNA families database. Nucleic Acids Res

Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships.

Asai K: Pre-dictions of RNA secondary structure by combining homologous sequence information. Bioinformatics

Asai K: Prediction of RNA secondary structure using generalized centroid estimators. Bioinformatics

Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics

Batzoglou S: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics

Batzoglou S: CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol

CE: A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res

CE: Sfold web server for statistical folding and rational design of nucleic acids.

Centroid estimation in discrete high-dimensional spaces with applications in biology.

CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-ofpairs score. Bioinformatics

Condon A: RNA STRAND: the RNA secondary structure and statistical analysis database.

DH: Improved RNA secondary structure prediction by maximizing expected pair accuracy. RNA

Dynamic programming alignment accuracy.

Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics

Fast folding and comparison of RNA secondary structures. Monatsh Chem

I: Specific alignment of structured RNA: stochastic grammars and sequence annealing. Bioinformatics

Improving the ac-curacy of predicting secondary structure for aligned RNA sequences. Nucleic Acids Res 2010. doi:10.1186/1471-2105-11-586 Cite this article as: Hamada et al.: Prediction of RNA secondary structure by maximizing pseudo-expected accuracy.

Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci USA

Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res

Mitchison G: Biological sequence analysis Cambridge, UK:

Mituyama T: CENTROID- FOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res

Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res

P: RNAalifold: improved consensus structure pre-diction for RNA alignments.

Pachter L: Fast statistical alignment. PLoS Comput Biol

Parameters for Accurate Genome Alignment.

PF: Secondary structure prediction for aligned RNA sequences.

Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res

RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. RNA

Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics

Sonnhammer EL: An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics

Tahi F: Tfold: efficient in silico prediction of non-coding RNA secondary structures. Nucleic Acids Res

The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers

The Highest Expected Reward Decoding for HMMs with Application to Recombination Detection. arXiv.org

Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Res

file:///data/remote/core/dit/data/Springer-OA/pdf/f18/aHR0cDovL2xpbmsuc3ByaW5nZXIuY29tLzEwLjExODYvMTQ3MS0yMTA1LTExLTU4Ni5wZGY=.pdf

Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

Abstract

Similar works

Full text

Available Versions

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Springer - Publisher Connector