62 research outputs found

    Finding common RNA pseudoknot structures in polynomial time

    Get PDF
    AbstractThis paper presents the first polynomial time algorithm for finding common RNA substructures that include pseudoknots and similar structures. While a more general problem is known to be NP-hard, this algorithm exploits special features of RNA structures to match RNA bonds correctly in polynomial time. Although the theoretical upper bound on the algorithmʼs time and space usage is high, the data-driven nature of its computation enables it to avoid computing unnecessary cases, dramatically reducing the actual running time. The algorithm works well in practice, and has been tested on sample RNA structures that include pseudoknots and pseudoknot-like tertiary structures

    A Progressive Folding Algorithm for RNA Secondary Structure Prediction

    Get PDF
    RNA secondary structure prediction is an area where computational techniques have shown great promise. Most RNA secondary structure prediction algorithms use dynamic programming to compute a secondary structure with minimum free energy. Energy minimization algorithms are less accurate on larger RNA molecules. One potential reason is that larger RNA molecules do not fold instantaneously. Instead, several studies show that RNA molecules fold progressively during transcription. This process could encourage the molecule to fold into a structure that is not at the global lowest energy level. Additionally, dynamic programming algorithms do not allow for a important type of structure called a pseudoknot. Secondary structure prediction allowing pseudoknots was recently shown to be NP-complete. We have created a simulation that captures these biological insights. Our simulation uses a probabilistic approach to fold the molecule progressively as it is synthesized. This thesis evaluates the performance of the simulation and presents several enhancements to improve efficiency and accuracy. Our results show that our progressive folding algorithm did not improve on current techniques. Additionally, we found that a simulated annealing algorithm using our probability models was more accurate than our progressive folding algorithm

    Structural relation matching: an algorithm to identify structural patterns into RNAs and their interactions

    Get PDF
    RNA molecules play crucial roles in various biological processes. Their three-dimensional configurations determine the functions and, in turn, influences the interaction with other molecules. RNAs and their interaction structures, the so-called RNA-RNA interactions, can be abstracted in terms of secondary structures, i.e., a list of the nucleotide bases paired by hydrogen bonding within its nucleotide sequence. Each secondary structure, in turn, can be abstracted into cores and shadows. Both are determined by collapsing nucleotides and arcs properly. We formalize all of these abstractions as arc diagrams, whose arcs determine loops. A secondary structure, represented by an arc diagram, is pseudoknot-free if its arc diagram does not present any crossing among arcs otherwise, it is said pseudoknotted. In this study, we face the problem of identifying a given structural pattern into secondary structures or the associated cores or shadow of both RNAs and RNA-RNA interactions, characterized by arbitrary pseudoknots. These abstractions are mapped into a matrix, whose elements represent the relations among loops. Therefore, we face the problem of taking advantage of matrices and submatrices. The algorithms, implemented in Python, work in polynomial time. We test our approach on a set of 16S ribosomal RNAs with inhibitors of Thermus thermophilus, and we quantify the structural effect of the inhibitors

    Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots

    Get PDF
    We present a general setting for structure-sequence comparison in a large class of RNA structures that unifies and generalizes a number of recent works on specific families on structures. Our approach is based on tree decomposition of structures and gives rises to a general parameterized algorithm, where the exponential part of the complexity depends on the family of structures. For each of the previously studied families, our algorithm has the same complexity as the specific algorithm that had been given before.Comment: (2012

    Inverse folding of RNA pseudoknot structures

    Get PDF
    Background: RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, {\tt RNAinverse}, {\tt RNA-SSD} as well as {\tt INFO-RNA} are limited to RNA secondary structures, we present in this paper the inverse folding algorithm {\tt Inv} which can deal with 3-noncrossing, canonical pseudoknot structures. Results: In this paper we present the inverse folding algorithm {\tt Inv}. We give a detailed analysis of {\tt Inv}, including pseudocodes. We show that {\tt Inv} allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures-a class which is difficult to construct via dynamic programming routines. {\tt Inv} is freely available at \url{http://www.combinatorics.cn/cbpc/inv.html}. Conclusions: The algorithm {\tt Inv} extends inverse folding capabilities to RNA pseudoknot structures. In comparison with {\tt RNAinverse} it uses new ideas, for instance by considering sets of competing structures. As a result, {\tt Inv} is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.Comment: 20 pages, 26 figure

    RNA structure analysis : algorithms and applications

    Get PDF
    In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure. Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree model to capture the structural particularities. The novel structure alignment algorithm is implemented using dynamic programming techniques coupled by position-independent scoring matrices. The algorithm can find the optimal global and local alignments between two RNA secondary structures at quadratic time complexity. When applied to searching a structure database, the algorithm can find similar RNA substructures and therefore can be used to identify functional RNA motifs. Extension of the algorithm has also been accomplished to deal with position-dependent scoring matrix in the purpose of aligning multiple structures. All algorithms have been implemented in a package under the name RSmatch and applied to searching mRNA UTR structure database and mining RNA motifs. The experimental results showed high efficiency and effectiveness of the proposed techniques

    Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach

    Get PDF
    Reeder J. Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach. Bielefeld (Germany): Bielefeld University; 2007.Our understanding of the role of RNA has undergone a major change in the last decade. Once believed to be only a mere carrier of information and structural component of the ribosomal machinery in the advent of the genomic age, it is now clear that RNAs play a much more active role. RNAs can act as regulators and can have catalytic activity - roles previously only attributed to proteins. There is still much speculation in the scientific community as to what extent RNAs are responsible for the complexity in higher organisms which can hardly be explained with only proteins as regulators. In order to investigate the roles of RNA, it is therefore necessary to search for new classes of RNA. For those and already known classes, analyses of their presence in different species of the tree of life will provide further insight about the evolution of biomolecules and especially RNAs. Since RNA function often follows its structure, the need for computer programs for RNA structure prediction is an immanent part of this procedure. The secondary structure of RNA - the level of base pairing - strongly determines the tertiary structure. As the latter is computationally intractable and experimentally expensive to obtain, secondary structure analysis has become an accepted substitute. In this thesis, I present two new algorithms (and a few variations thereof) for the prediction of RNA secondary structures. The first algorithm addresses the problem of predicting a secondary structure from a single sequence including RNA pseudoknots. Pseudoknots have been shown to be functionally relevant in many RNA mediated processes. However, pseudoknots are excluded from considerations by state-of-the-art RNA folding programs for reasons of computational complexity. While folding a sequence of length n into unknotted structures requires O(n^3) time and O(n^2) space, finding the best structure including arbitrary pseudoknots has been proven to be NP-complete. Nevertheless, I demonstrate in this work that certain types of pseudoknots can be included in the folding process with only a moderate increase of computational cost. In analogy to protein coding RNA, where a conserved encoded protein hints at a similar metabolic function, structural conservation in RNA may give clues to RNA function and to finding of RNA genes. However, structure conservation is more complex to deal with computationally than sequence conservation. The method considered to be at least conceptually the ideal approach in this situation is the Sankoff algorithm. It simultaneously aligns two sequences and predicts a common secondary structure. Unfortunately, it is computationally rather expensive - O(n^6) time and O(n^4) space for two sequences, and for more than two sequences it becomes exponential in the number of sequences! Therefore, several heuristic implementations emerged in the last decade trying to make the Sankoff approach practical by introducing pragmatic restrictions on the search space. In this thesis, I propose to redefine the consensus structure prediction problem in a way that does not imply a multiple sequence alignment step. For a family of RNA sequences, my method explicitly and independently enumerates the near-optimal abstract shape space and predicts an abstract shape as the consensus for all sequences. For each sequence, it delivers the thermodynamically best structure which has this shape. The technique of abstract shapes analysis is employed here for a synoptic view of the suboptimal folding space. As the shape space is much smaller than the structure space, and identification of common shapes can be done in linear time (in the number of shapes considered), the method is essentially linear in the number of sequences. Evaluations show that the new method compares favorably with available alternatives

    Automated Design of Dynamic Programming Schemes for RNA Folding with Pseudoknots

    Get PDF
    Despite being a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, RNA secondary structure prediction remains challenging whenever pseudoknots come into play. To circumvent the NP-hardness of energy minimization in realistic energy models, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. While these methods rely on hand-crafted DP schemes, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. We formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the tree-width tw of the fatgraph, and its output represents a ?(n^{tw+1}) algorithm for predicting the MFE folding of an RNA of length n. Our general framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case

    ConStruct: Improved construction of RNA consensus structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aligning homologous non-coding RNAs (ncRNAs) correctly in terms of sequence and structure is an unresolved problem, due to both mathematical complexity and imperfect scoring functions. High quality alignments, however, are a prerequisite for most consensus structure prediction approaches, homology searches, and tools for phylogeny inference. Automatically created ncRNA alignments often need manual corrections, yet this manual refinement is tedious and error-prone.</p> <p>Results</p> <p>We present an extended version of CONSTRUCT, a semi-automatic, graphical tool suitable for creating RNA alignments correct in terms of both consensus sequence and consensus structure. To this purpose CONSTRUCT combines sequence alignment, thermodynamic data and various measures of covariation.</p> <p>One important feature is that the user is guided during the alignment correction step by a consensus dotplot, which displays all thermodynamically optimal base pairs and the corresponding covariation. Once the initial alignment is corrected, optimal and suboptimal secondary structures as well as tertiary interaction can be predicted. We demonstrate CONSTRUCT's ability to guide the user in correcting an initial alignment, and show an example for optimal secondary consensus structure prediction on very hard to align SECIS elements. Moreover we use CONSTRUCT to predict tertiary interactions from sequences of the internal ribosome entry site of CrP-like viruses. In addition we show that alignments specifically designed for benchmarking can be easily be optimized using CONSTRUCT, although they share very little sequence identity.</p> <p>Conclusion</p> <p>CONSTRUCT's graphical interface allows for an easy alignment correction based on and guided by predicted and known structural constraints. It combines several algorithms for prediction of secondary consensus structure and even tertiary interactions. The CONSTRUCT package can be downloaded from the URL listed in the Availability and requirements section of this article.</p
    corecore