BACKGROUND: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. RESULTS: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. CONCLUSION: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License

Holmes, Ian

English

PubMed

BackgroundPairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered.ResultsWe demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database.ConclusionA program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License

eScholarship - University of California

Accelerated probabilistic inference of RNA structure evolution

Ian Holmes

Springer - Publisher Connector

Abstract Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License.</p

Holmes Ian

Directory of Open Access Journals

BMC Bioinformatics

A Grammar-Based Unification of Several Alignment and Folding Algorithms.

A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons.

A probabilistic model for the evolution of RNA structure.

Amino Acid Substitution Matrices from an Information Theoretic Perspective.

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge, UK:

DH: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences.

Dynamic programming alignment accuracy.

Eddy SR: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction.

Eddy SR: Noncoding RNA gene detection using comparative sequence analysis.

Eddy SR: Rfam: an RNA family database.

Eddy SR: Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics

Explaining and Controlling Ambiguity in Dynamic Programming.

Finding the common structure shared by two homologous RNAs. Bioinformatics

GD: Finding the most significant common sequence and structure motifs in a set of RNA sequences.

Generic Programming and the STL: Using and Extending the C++ Standard Template Library Addison-Wesley;

MSARI: Multiple sequence alignments for statistical detection of RNA secondary structure.

Noncoding RNA genes.

On Certain Formal Properties of Grammars. Information and Control

Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information. Nucleic Acids Research

PF: Alignment of RNA base pairing probability matrices. Bioinformatics

Plasterk RH: Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature

PM: Fast and Sensitive Multiple Sequence Alignments on a Microcomputer. Computer Applications in the Biosciences

PROBCONS: Probabilistic Consistency-based Multiple Alignment of Amino Acid Sequences.

RNA Pseudoknot Modeling Using Intersections of Stochastic Context-Free Grammars with Applications to Database Search.

RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics

RNA Sequence Analysis Using Covariance Models.

RNA silencing in plants. Nature

RR: Riboswitches Control Fundamental Biochemical Pathways in Bacillus subtilis and Other Bacteria. Cell

Rubin GM: Pairwise RNA structure comparison using stochastic context-free grammars.

Simultaneous solution of the RNA folding, alignment, and protosequence problems.

Studies in probabilistic sequence alignment and evolution.

The Equilibrium Partition Function and Base Pair Binding Probabilities for RNA Secondary Structure. Biopolymers

The functions of animal microRNAs.

Three Models for the Description of Language.

Waterman MS: Identification of Common Molecular Subsequences.

Young SJ: The Estimation of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm. Computer Speech and Language

Zhang KZ: Comparing multiple RNA secondary structures using tree comparisons.

file:///data/core-remote/dit/data/Springer-OA/pdf/75e/aHR0cDovL2xpbmsuc3ByaW5nZXIuY29tLzEwLjExODYvMTQ3MS0yMTA1LTYtNzMucGRm.pdf

Accelerated probabilistic inference of RNA structure evolution

Abstract

Similar works

Full text

Available Versions

eScholarship - University of California

Springer - Publisher Connector

Directory of Open Access Journals

Springer - Publisher Connector