12 research outputs found

    A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic <it>RNA-folding problem </it>of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length <it>n</it>, has an <it>O</it>(<it>n</it><sup>3</sup>)-time dynamic programming solution that is widely applied. It is known that an <it>o</it>(<it>n</it><sup>3</sup>) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the <it>O</it>(<it>n</it><sup>3</sup>) worst-case time bound when <it>n </it>is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the <it>Four-Russians </it>technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding.</p> <p>Results</p> <p>In this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of <it>O</it>(<it>n</it><sup>3</sup>/log(<it>n</it>)).</p> <p>Conclusions</p> <p>We show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.</p

    Sparsification of RNA structure prediction including pseudoknots

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem.</p> <p>Results</p> <p>In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms, among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerich algorithm). In all algorithms the number of "candidate" substructures to be considered is reduced.</p> <p>Conclusions</p> <p>Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.</p

    A max-margin model for efficient simultaneous alignment and folding of RNA sequences

    Get PDF
    Motivation: The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction

    Efficient alignment of RNA secondary structures using sparse dynamic programming

    Get PDF
    BACKGROUND: Current advances of the next-generation sequencing technology have revealed a large number of un-annotated RNA transcripts. Comparative study of the RNA structurome is an important approach to assess their biological functionalities. Due to the large sizes and abundance of the RNA transcripts, an efficient and accurate RNA structure-structure alignment algorithm is in urgent need to facilitate the comparative study. Despite the importance of the RNA secondary structure alignment problem, there are no computational tools available that provide high computational efficiency and accuracy. In this case, designing and implementing such an efficient and accurate RNA secondary structure alignment algorithm is highly desirable. RESULTS: In this work, through incorporating the sparse dynamic programming technique, we implemented an algorithm that has an O(n(3)) expected time complexity, where n is the average number of base pairs in the RNA structures. This complexity, which can be shown assuming the polymer-zeta property, is confirmed by our experiments. The resulting new RNA secondary structure alignment tool is called ERA. Benchmark results indicate that ERA can significantly speedup RNA structure-structure alignments compared to other state-of-the-art RNA alignment tools, while maintaining high alignment accuracy. CONCLUSIONS: Using the sparse dynamic programming technique, we are able to develop a new RNA secondary structure alignment tool that is both efficient and accurate. We anticipate that the new alignment algorithm ERA will significantly promote comparative RNA structure studies. The program, ERA, is freely available at http://genome.ucf.edu/ERA

    Practicality and time complexity of a sparsified RNA folding algorithm

    Get PDF
    Commonly used RNA folding programs compute the minimum free energy structure of a sequence under the pseudoknot exclusion constraint. They are based on Zuker's algorithm which runs in time O(n^3). Recently, it has been claimed that RNA folding can be achieved in average time O(n^2) using a sparsification technique. A proof of quadratic time complexity was based on the assumption that computational RNA folding obeys the "polymer-zeta property". Several variants of sparse RNA folding algorithms were later developed. Here, we present our own version, which is readily applicable to existing RNA folding programs, as it is extremely simple and does not require any new data structure. We applied it to the widely used Vienna RNAfold program, to create sibRNAfold, the first public sparsified version of a standard RNA folding program. To gain a better understanding of the time complexity of sparsified RNA folding in general, we carried out a thorough run time analysis with synthetic random sequences, both in the context of energy minimization and base pairing maximization. Contrary to previous claims, the asymptotic time complexity of a sparsified RNA folding algorithm using standard energy parameters remains O(n^3) under a wide variety of conditions. Consistent with our run-time analysis, we found that RNA folding does not obey the "polymer-zeta property" as claimed previously. Yet, a basic version of a sparsified RNA folding algorithm provides 15- to 50-fold speed gain. Surprisingly, the same sparsification technique has a different effect when applied to base pairing optimization. There, its asymptotic running time complexity appears to be either quadratic or cubic depending on the base composition. The code used in this work is available at: http://sibRNAfold.sourceforge.net/

    Pareto optimization in algebraic dynamic programming

    Get PDF
    Saule C, Giegerich R. Pareto optimization in algebraic dynamic programming. Algorithms for Molecular Biology. 2015;10(1): 22.Pareto optimization combines independent objectives by computing the Pareto front of its search space, defined as the set of all solutions for which no other candidate solution scores better under all objectives. This gives, in a precise sense, better information than an artificial amalgamation of different scores into a single objective, but is more costly to compute. Pareto optimization naturally occurs with genetic algorithms, albeit in a heuristic fashion. Non-heuristic Pareto optimization so far has been used only with a few applications in bioinformatics. We study exact Pareto optimization for two objectives in a dynamic programming framework. We define a binary Pareto product operator ∗Par on arbitrary scoring schemes. Independent of a particular algorithm, we prove that for two scoring schemes A and B used in dynamic programming, the scoring scheme A∗ParB correctly performs Pareto optimization over the same search space. We study different implementations of the Pareto operator with respect to their asymptotic and empirical efficiency. Without artificial amalgamation of objectives, and with no heuristics involved, Pareto optimization is faster than computing the same number of answers separately for each objective. For RNA structure prediction under the minimum free energy versus the maximum expected accuracy model, we show that the empirical size of the Pareto front remains within reasonable bounds. Pareto optimization lends itself to the comparative investigation of the behavior of two alternative scoring schemes for the same purpose. For the above scoring schemes, we observe that the Pareto front can be seen as a composition of a few macrostates, each consisting of several microstates that differ in the same limited way. We also study the relationship between abstract shape analysis and the Pareto front, and find that they extract information of a different nature from the folding space and can be meaningfully combined
    corecore