BACKGROUND: We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. RESULTS: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. CONCLUSION: Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses

AV Uzilov

B Gulko

B Knudsen

B Morgenstern

D Sankoff

DH Mathews

DKY Chiu

DS Fields

E Rivas

G Storz

I Holmes

IL Hofacker

J Gorodkin

J Reeder

J Wuyts

JE Hopcroft

JE Tabaska

JH Havgaard

M Zuker

MS Waterman

NR Pace

O Perriquet

PP Gardner

R Durbin

R Giegerich

R Green

R Lück

R Nussinov

RD Dowell

Robin D Dowell

RR Gutell

S Batzoglou

S Griffiths-Jones

Sean R Eddy

SR Eddy

SV Muse

V Juan

VR Akmaev

English

PubMed

Crossref

Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

Abstract Background We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. Results We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. Conclusion Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses.</p

Dowell Robin D

Eddy Sean R

Directory of Open Access Journals

BMC Bioinformatics

Springer - Publisher Connector

Dowell, Robin D

Eddy, Sean R

Digital Commons@Becker

A: Rfam: Annotating Non-Coding RNAs in Complete Genomes. Nucl Acids Res

Accelerated Probabilistic Inference of RNA Structure Evolution.

An Expanding Universe of Noncoding RNAs. Science

BD: Phylogenetic Comparative Analysis and the Secondary

Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison.

Calculating Nucleic Acid Secondary Structure. Curr Opin Struct Biol

Construct: a Tool for Thermodynamic Controlled Prediction of Conserved Secondary Structure. Nucl Acids Res

DH: Detection of non-Coding RNAs on the Basis of Predicted Secondary Structure Formation Free Energy Change.

DH: Dynalign: an Algorithm for Finding the Secondary Structure Common to two RNA Sequences.

DH: Expanded Sequence Dependence of Thermodynamic Parameters

Eddy SR: Evaluation of Several Lightweight Stochastic Context-Free Grammars for RNA Secondary Structure Prediction.

Eddy SR: Noncoding RNA Gene Detection Using Comparative Sequence Analysis.

Effective ambiguity checking in biosequence analysis.

Evolutionary Analyses of DNA Sequences Subject to Constraints on Secondary Structure. Genetics

Explaining and Controlling Ambiguity in Dynamic Programming.

Fast Folding and

Finding the Common Structure Shared by Two Homologous RNAs. Bioinformatics

GD: An RNA Folding Method Capable of Identifying Pseudoknots and Base Triples. Bioinformatics

GD: Discovering Common Stem-Loop Motifs in Unaligned RNA Sequences. Nucl Acids Res

GD: Finding Common Sequence and Structure Motifs in a Set of RNA Sequences.

GD: Finding the Most Significant Common Sequence and Structure Motifs in a set of RNA Sequences. Nucl Acids Res

GD: Identifying Constraints on the Higher-Order Structure of RNA: Continued Development and Application of Comparative Sequence Analysis Methods. Nucl Acids Res

GD: Phylogenetically Enhanced Statistical Tools for RNA Structure Prediction. Bioinformatics

Gutell RR: An Analysis of Large rRNA Sequences Folded by a Thermodynamic Method. Fold Des

Haussler D: Using Multiple Alignments and Phylogenetic Trees to Detect RNA Secondary Structure. Pac Symp Biocomput

HW: Exon Discovery by Genomic Sequence Alignment. Bioinformatics

Kleitman DJ: Algorithms for Loop Matchings.

Kolodziejczak T: Inferring Consensus Structure from Nucleic Acid Sequences. Comput Applic Biosci

Mitchison GJ: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge UK:

Non-Coding RNA Genes and the Modern RNA World. Nat Rev Genet

Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information. Nucl Acids Res

Pairwise Local Structural Alignment of RNA Sequences With Sequence Similarity Less Than 40%. Bioinformatics

PF: Alignment of RNA Base Pairing Probability Matrices. Bioinformatics

PF: Secondary Structure Prediction for Aligned RNA Sequences.

Predicting a set of Minimal Free Energy RNA Secondary Structures Common to two Sequences. Bioinformatics

RD: The European Large Subunit Ribosomal RNA Database. Nucl Acids Res

RNA Secondary Structure Prediction Based on Free Energy and Phylogenetic Analysis.

RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars and Evolutionary History. Bioinformatics

RNA Structural Alignment Using Stochastic Context-Free Grammars.

Simultaneous Solution of the RNA Folding, Alignment, and Protosequence Problems.

Studies in Probabilistic Sequence Alignment and Evolution.

TF: RNA Secondary Structure: A Complete Mathematical Analysis.

The Accuracy of Ribosomal RNA Comparative Structure Models. Curr Opin Struct Biol

The Many Faces of Sequence Alignment. Brief Bioinform

Ullman JD: Introduction to Automata Theory Languages, and Computation Addison-Wesley;

Washietl S: A Benchmark of Multiple Sequence Alignment Programs Upon Structural RNAs. Nucleic Acids Res

Woese CR: Lessons from an Evolving rRNA: 16S and 23S rRNA Structures from a Comparative Perspective. Microbiol Rev

http://doaj.org/search?source=%7B%22query%22%3A%7B%22bool%22%3A%7B%22must%22%3A%5B%7B%22term%22%3A%7B%22id%22%3A%22ba1ee406800c49d49cada7791c65687c%22%7D%7D%5D%7D%7D%7D

Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

Abstract

Similar works

Full text

Available Versions

Crossref

Directory of Open Access Journals

Springer - Publisher Connector

Digital Commons@Becker