Search CORE

Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

Author: Dowell Robin D
Eddy Sean R
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction? RESULTS: Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures. CONCLUSIONS: Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others

Digital Commons@Becker

Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

Author: AV Uzilov
B Gulko
B Knudsen
B Knudsen
B Morgenstern
D Sankoff
DH Mathews
DH Mathews
DH Mathews
DKY Chiu
DS Fields
E Rivas
G Storz
I Holmes
I Holmes
I Holmes
IL Hofacker
IL Hofacker
IL Hofacker
J Gorodkin
J Gorodkin
J Gorodkin
J Reeder
J Wuyts
J Wuyts
JE Hopcroft
JE Tabaska
JH Havgaard
M Zuker
M Zuker
MS Waterman
NR Pace
O Perriquet
PP Gardner
R Durbin
R Giegerich
R Green
R Lück
R Nussinov
RD Dowell
RD Dowell
Robin D Dowell
RR Gutell
RR Gutell
RR Gutell
S Batzoglou
S Griffiths-Jones
Sean R Eddy
SR Eddy
SV Muse
V Juan
VR Akmaev
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. RESULTS: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. CONCLUSION: Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses

Springer - Publisher Connector

Springer - Publisher Connector

Digital Commons@Becker

A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure

Author: Eddy Sean R
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: Covariance models (CMs) are probabilistic models of RNA secondary structure, analogous to profile hidden Markov models of linear sequence. The dynamic programming algorithm for aligning a CM to an RNA sequence of length N is O(N(3)) in memory. This is only practical for small RNAs. RESULTS: I describe a divide and conquer variant of the alignment algorithm that is analogous to memory-efficient Myers/Miller dynamic programming algorithms for linear sequence alignment. The new algorithm has an O(N(2) log N) memory complexity, at the expense of a small constant factor in time. CONCLUSIONS: Optimal ribosomal RNA structural alignments that previously required up to 150 GB of memory now require less than 270 MB

CiteSeerX

Public Library of Science (PLOS)

Evolutionary Modeling and Prediction of Non-Coding RNAs in Drosophila

Author: A Siepel
A Siepel
A Stark
A Varadarajan
AG Clark
Andrew V. Uzilov
B Knudsen
B Paten
CN Dewey
D Rose
D St Johnston
DP Bartel
DS Parker
E Boyle
E Lcuyer
E Nawrocki
E Rivas
E Rivas
E Torarinsson
G McGuire
Ian Holmes
IL Hofacker
J Brennecke
J Pedersen
J Ruby
JL Thorne
JP Bachellerie
JR Manak
JS Pedersen
JS Pedersen
JS Pedersen
KS Pollard
Lars Barquist
M Crosby
M Mandal
M Pheasant
M Sprinzl
Mitchell E. Skinner
N Bray
N Goldman
PD Rijk
PS Klosterman
RD Dowell
RD Dowell
RK Bradley
Robert Belshaw
Robert K. Bradley
S Griffiths-Jones
S Washietl
T Babak
T Elgavish
T Gesell
TM Lowe
V Ambros
WJ Bruno
YR Bendana
Yuri R. Bendaña
Z Wang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA

CiteSeerX

Washington University St. Louis: Open Scholarship

Structural RNA Homology Search and Alignment Using Covariance Models

Author: Nawrocki Eric
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2009
Field of study

Functional RNA elements do not encode proteins, but rather function directly as RNAs. Many different types of RNAs play important roles in a wide range of cellular processes, including protein synthesis, gene regulation, protein transport, splicing, and more. Because important sequence and structural features tend to be evolutionarily conserved, one way to learn about functional RNAs is through comparative sequence analysis - by collecting and aligning examples of homologous RNAs and comparing them. Covariance models: CMs) are powerful computational tools for homology search and alignment that score both the conserved sequence and secondary structure of an RNA family. However, due to the high computational complexity of their search and alignment algorithms, searches against large databases and alignment of large RNAs like small subunit ribosomal RNA: SSU rRNA) are prohibitively slow. Large-scale alignment of SSU rRNA is of particular utility for environmental survey studies of microbial diversity which often use the rRNA as a phylogenetic marker of microorganisms. In this work, we improve CM methods by making them faster and more sensitive to remote homology. To accelerate searches, we introduce a query-dependent banding: QDB) technique that makes scoring sequences more efficient by restricting the possible lengths of structural elements based on their probability given the model. We combine QDB with a complementary filtering method that quickly prunes away database subsequences deemed unlikely to receive high CM scores based on sequence conservation alone. To increase search sensitivity, we apply two model parameterization strategies from protein homology search tools to CMs. As judged by our benchmark, these combined approaches yield about a 250-fold speedup and significant increase in search sensitivity compared with previous implementations. To accelerate alignment, we apply a method that uses a fast sequence-based alignment of a target sequence to determine constraints for the more expensive CM sequence- and structure-based alignment. This technique reduces the time required to align one SSU rRNA sequence from about 15 minutes to 1 second with a negligible effect on alignment accuracy. Collectively, these improvements make CMs more powerful and practical tools for RNA homology search and alignment

ProQuest OAI Repository

A note on retrodiction and machine evolution

Author: Caetano-Anollés Gustavo
Publication venue
Publication date: 25/03/2023
Field of study

Biomolecular communication demands that interactions between parts of a molecular system act as scaffolds for message transmission. It also requires an evolving and organized system of signs - a communicative agency - for creating and transmitting meaning. Here I explore the need to dissect biomolecular communication with retrodiction approaches that make claims about the past given information that is available in the present. While the passage of time restricts the explanatory power of retrodiction, the use of molecular structure in biology offsets information erosion. This allows description of the gradual evolutionary rise of structural and functional innovations in RNA and proteins. The resulting chronologies can also describe the gradual rise of molecular machines of increasing complexity and computation capabilities. For example, the accretion of rRNA substructures and ribosomal proteins can be traced in time and placed within a geological timescale. Phylogenetic, algorithmic and theoretical-inspired accretion models can be reconciled into a congruent evolutionary model. Remarkably, the time of origin of enzymes, functional RNA, non-ribosomal peptide synthetase (NRPS) complexes, and ribosomes suggest they gradually climbed Chomsky's hierarchy of formal grammars, supporting the gradual complexification of machines and communication in molecular biology. Future retrodiction approaches and in-depth exploration of theoretical models of computation will need to confirm such evolutionary progression.Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

Modeling RNA tertiary structure motifs by graph-grammars

Author: Hamel Sylvie
Major François
St-Onge Karine
Thibault Philippe
Publication venue: Oxford University Press
Publication date: 21/02/2007
Field of study

A new approach, graph-grammars, to encode RNA tertiary structure patterns is introduced and exemplified with the classical sarcin–ricin motif. The sarcin–ricin motif is found in the stem of the crucial ribosomal loop E (also referred to as the sarcin–ricin loop), which is sensitive to the α-sarcin and ricin toxins. Here, we generate a graph-grammar for the sarcin-ricin motif and apply it to derive putative sequences that would fold in this motif. The biological relevance of the derived sequences is confirmed by a comparison with those found in known sarcin–ricin sites in an alignment of over 800 bacterial 23S ribosomal RNAs. The comparison raised alternative alignments in few sarcin–ricin sites, which were assessed using tertiary structure predictions and 3D modeling. The sarcin–ricin motif graph-grammar was built with indivisible nucleotide interaction cycles that were recently observed in structured RNAs. A comparison of the sequences and 3D structures of each cycle that constitute the sarcin–ricin motif gave us additional insights about RNA sequence–structure relationships. In particular, this analysis revealed the sequence space of an RNA motif depends on a structural context that goes beyond the single base pairing and base-stacking interactions