58 research outputs found

    Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction

    Get PDF
    BACKGROUND: RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction? RESULTS: Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures. CONCLUSIONS: Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others

    CentroidFold: a web server for RNA secondary structure prediction

    Get PDF
    The CentroidFold web server (http://www.ncrna.org/centroidfold/) is a web application for RNA secondary structure prediction powered by one of the most accurate prediction engine. The server accepts two kinds of sequence data: a single RNA sequence and a multiple alignment of RNA sequences. It responses with a prediction result shown as a popular base-pair notation and a graph representation. PDF version of the graph representation is also available. For a multiple alignment sequence, the server predicts a common secondary structure. Usage of the server is quite simple. You can paste a single RNA sequence (FASTA or plain sequence text) or a multiple alignment (CLUSTAL-W format) into the textarea then click on the ‘execute CentroidFold’ button. The server quickly responses with a prediction result. The major advantage of this server is that it employs our original CentroidFold software as its prediction engine which scores the best accuracy in our benchmark results. Our web server is freely available with no login requirement

    Convergence Thresholds of Newton's Method for Monotone Polynomial Equations

    Get PDF
    Monotone systems of polynomial equations (MSPEs) are systems of fixed-point equations X1=f1(X1,...,Xn),X_1 = f_1(X_1, ..., X_n), ...,Xn=fn(X1,...,Xn)..., X_n = f_n(X_1, ..., X_n) where each fif_i is a polynomial with positive real coefficients. The question of computing the least non-negative solution of a given MSPE X=f(X)\vec X = \vec f(\vec X) arises naturally in the analysis of stochastic models such as stochastic context-free grammars, probabilistic pushdown automata, and back-button processes. Etessami and Yannakakis have recently adapted Newton's iterative method to MSPEs. In a previous paper we have proved the existence of a threshold kfk_{\vec f} for strongly connected MSPEs, such that after kfk_{\vec f} iterations of Newton's method each new iteration computes at least 1 new bit of the solution. However, the proof was purely existential. In this paper we give an upper bound for kfk_{\vec f} as a function of the minimal component of the least fixed-point μf\mu\vec f of f(X)\vec f(\vec X). Using this result we show that kfk_{\vec f} is at most single exponential resp. linear for strongly connected MSPEs derived from probabilistic pushdown automata resp. from back-button processes. Further, we prove the existence of a threshold for arbitrary MSPEs after which each new iteration computes at least 1/w2h1/w2^h new bits of the solution, where ww and hh are the width and height of the DAG of strongly connected components.Comment: version 2 deposited February 29, after the end of the STACS conference. Two minor mistakes correcte

    RNA secondary structure prediction from multi-aligned sequences

    Full text link
    It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

    An Approach to Analyze the Ambiguity in RNA Structure

    Get PDF

    Discriminatory power of RNA family models

    Get PDF
    Motivation: RNA family models group nucleotide sequences that share a common biological function. These models can be used to find new sequences belonging to the same family. To succeed in this task, a model needs to exhibit high sensitivity as well as high specificity. As model construction is guided by a manual process, a number of problems can occur, such as the introduction of more than one model for the same family or poorly constructed models. We explore the Rfam database to discover such problems

    RNAstructure: software for RNA secondary structure prediction and analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence.</p> <p>Results</p> <p>RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained.</p> <p>Conclusion</p> <p>The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at <url>http://rna.urmc.rochester.edu/RNAstructure.html</url>.</p

    Topology of RNA-RNA interaction structures

    Get PDF
    The topological filtration of interacting RNA complexes is studied and the role is analyzed of certain diagrams called irreducible shadows, which form suitable building blocks for more general structures. We prove that for two interacting RNAs, called interaction structures, there exist for fixed genus only finitely many irreducible shadows. This implies that for fixed genus there are only finitely many classes of interaction structures. In particular the simplest case of genus zero already provides the formalism for certain types of structures that occur in nature and are not covered by other filtrations. This case of genus zero interaction structures is already of practical interest, is studied here in detail and found to be expressed by a multiple context-free grammar extending the usual one for RNA secondary structures. We show that in O(n6)O(n^6) time and O(n4)O(n^4) space complexity, this grammar for genus zero interaction structures provides not only minimum free energy solutions but also the complete partition function and base pairing probabilities.Comment: 40 pages 15 figure

    PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion.</p> <p>Results</p> <p>Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities.</p> <p>Conclusions</p> <p>Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: <url>http://www.ece.tamu.edu/~bjyoon/picxaa/</url>.</p

    Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments

    Get PDF
    Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold
    corecore