25 research outputs found

    Automated rendering of multi-stranded DNA complexes with pseudoknots

    Full text link
    We present a general method for rendering representations of multi-stranded DNA complexes from textual descriptions into 2D diagrams. The complexes can be arbitrarily pseudoknotted, and if a planar rendering is possible, the method will determine one in time which is essentially linear in the size of the textual description. (That is, except for a final stochastic fine-tuning step.) If a planar rendering is not possible, the method will compute a visually pleasing approximate rendering in quadratic time. Examples of diagrams produced by the method are presented in the paper.Comment: 12 pages, 7 figure

    Automated Design of Dynamic Programming Schemes for RNA Folding with Pseudoknots

    Get PDF
    Despite being a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, RNA secondary structure prediction remains challenging whenever pseudoknots come into play. To circumvent the NP-hardness of energy minimization in realistic energy models, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. While these methods rely on hand-crafted DP schemes, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. We formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the tree-width tw of the fatgraph, and its output represents a ?(n^{tw+1}) algorithm for predicting the MFE folding of an RNA of length n. Our general framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case

    RNA secondary structure prediction from multi-aligned sequences

    Full text link
    It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

    Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach

    Get PDF
    Reeder J. Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach. Bielefeld (Germany): Bielefeld University; 2007.Our understanding of the role of RNA has undergone a major change in the last decade. Once believed to be only a mere carrier of information and structural component of the ribosomal machinery in the advent of the genomic age, it is now clear that RNAs play a much more active role. RNAs can act as regulators and can have catalytic activity - roles previously only attributed to proteins. There is still much speculation in the scientific community as to what extent RNAs are responsible for the complexity in higher organisms which can hardly be explained with only proteins as regulators. In order to investigate the roles of RNA, it is therefore necessary to search for new classes of RNA. For those and already known classes, analyses of their presence in different species of the tree of life will provide further insight about the evolution of biomolecules and especially RNAs. Since RNA function often follows its structure, the need for computer programs for RNA structure prediction is an immanent part of this procedure. The secondary structure of RNA - the level of base pairing - strongly determines the tertiary structure. As the latter is computationally intractable and experimentally expensive to obtain, secondary structure analysis has become an accepted substitute. In this thesis, I present two new algorithms (and a few variations thereof) for the prediction of RNA secondary structures. The first algorithm addresses the problem of predicting a secondary structure from a single sequence including RNA pseudoknots. Pseudoknots have been shown to be functionally relevant in many RNA mediated processes. However, pseudoknots are excluded from considerations by state-of-the-art RNA folding programs for reasons of computational complexity. While folding a sequence of length n into unknotted structures requires O(n^3) time and O(n^2) space, finding the best structure including arbitrary pseudoknots has been proven to be NP-complete. Nevertheless, I demonstrate in this work that certain types of pseudoknots can be included in the folding process with only a moderate increase of computational cost. In analogy to protein coding RNA, where a conserved encoded protein hints at a similar metabolic function, structural conservation in RNA may give clues to RNA function and to finding of RNA genes. However, structure conservation is more complex to deal with computationally than sequence conservation. The method considered to be at least conceptually the ideal approach in this situation is the Sankoff algorithm. It simultaneously aligns two sequences and predicts a common secondary structure. Unfortunately, it is computationally rather expensive - O(n^6) time and O(n^4) space for two sequences, and for more than two sequences it becomes exponential in the number of sequences! Therefore, several heuristic implementations emerged in the last decade trying to make the Sankoff approach practical by introducing pragmatic restrictions on the search space. In this thesis, I propose to redefine the consensus structure prediction problem in a way that does not imply a multiple sequence alignment step. For a family of RNA sequences, my method explicitly and independently enumerates the near-optimal abstract shape space and predicts an abstract shape as the consensus for all sequences. For each sequence, it delivers the thermodynamically best structure which has this shape. The technique of abstract shapes analysis is employed here for a synoptic view of the suboptimal folding space. As the shape space is much smaller than the structure space, and identification of common shapes can be done in linear time (in the number of shapes considered), the method is essentially linear in the number of sequences. Evaluations show that the new method compares favorably with available alternatives

    A Study of Pseudo-Periodic and Pseudo-Bordered Words for Functions Beyond Identity and Involution

    Get PDF
    Periodicity, primitivity and borderedness are some of the fundamental notions in combinatorics on words. Motivated by the Watson-Crick complementarity of DNA strands wherein a word (strand) over the DNA alphabet \{A, G, C, T\} and its Watson-Crick complement are informationally ``identical , these notions have been extended to consider pseudo-periodicity and pseudo-borderedness obtained by replacing the ``identity function with ``pseudo-identity functions (antimorphic involution in case of Watson-Crick complementarity). For a given alphabet Σ\Sigma, an antimorphic involution θ\theta is an antimorphism, i.e., θ(uv)=θ(v)θ(u)\theta(uv)=\theta(v) \theta(u) for all u,vΣu,v \in \Sigma^{*} and an involution, i.e., θ(θ(u))=u\theta(\theta(u))=u for all uΣu \in \Sigma^{*}. In this thesis, we continue the study of pseudo-periodic and pseudo-bordered words for pseudo-identity functions including involutions. To start with, we propose a binary word operation, θ\theta-catenation, that generates θ\theta-powers (pseudo-powers) of a word for any morphic or antimorphic involution θ\theta. We investigate various properties of this operation including closure properties of various classes of languages under it, and its connection with the previously defined notion of θ\theta-primitive words. A non-empty word uu is said to be θ\theta-bordered if there exists a non-empty word vv which is a prefix of uu while θ(v)\theta(v) is a suffix of uu. We investigate the properties of θ\theta-bordered (pseudo-bordered) and θ\theta-unbordered (pseudo-unbordered) words for pseudo-identity functions θ\theta with the property that θ\theta is either a morphism or an antimorphism with θn=I\theta^{n}=I, for a given n2n \geq 2, or θ\theta is a literal morphism or an antimorphism. Lastly, we initiate a new line of study by exploring the disjunctivity properties of sets of pseudo-bordered and pseudo-unbordered words and some other related languages for various pseudo-identity functions. In particular, we consider such properties for morphic involutions θ\theta and prove that, for any i2i \geq 2, the set of all words with exactly ii θ\theta-borders is disjunctive (under certain conditions)

    Posets and spaces of k-noncrossing RNA structures

    Get PDF
    RNA molecules are single-stranded analogues of DNA that can fold into various structures which influence their biological function within the cell. RNA structures can be modelled combinatorially in terms of a certain type of graph called an RNA diagram. In this paper we introduce a new poset of RNA diagrams Bf,kr\mathcal{B}^r_{f,k}, r0r\ge 0, k1k \ge 1 and f3f \ge 3, which we call the Penner-Waterman poset, and, using results from the theory of multitriangulations, we show that this is a pure poset of rank k(2f2k+1)+rf1k(2f-2k+1)+r-f-1, whose geometric realization is the join of a simplicial sphere of dimension k(f2k)1k(f-2k)-1 and an ((f+1)(k1)1)\left((f+1)(k-1)-1\right)-simplex in case r=0r=0. As a corollary for the special case k=1k=1, we obtain a result due to Penner and Waterman concerning the topology of the space of RNA secondary structures. These results could eventually lead to new ways to investigate landscapes of RNA kk-noncrossing structures

    Kisses, ambivalent models and more: Contributions to the analysis of RNA secondary structure.

    Get PDF
    Janssen S. Kisses, ambivalent models and more: Contributions to the analysis of RNA secondary structure. Bielefeld: Universitätsbibliothek; 2014.The full functional role of RNA in all domains of life is yet to be explored. Deep sequencing technologies generate massive data about RNA transcripts with functional potential. To decipher this information, bioinformatics methods for structural analysis are in demand. With this thesis at hand, we want to improve current secondary structure prediction in different respects. The introductory chapter explains ADP with a focus on its comfortable, but atypical style of specifying algorithms. Then, we present five contributions to the analysis of RNA secondary structures. 1. It is the nature of models to abstract and simplify reality in order to master its complexity. Chapter 3 is an in depth analysis of four popular computational models of RNA secondary structure (Programs RNAshapes and RNAalishapes). 2. The secondary structure of RNA is too dynamic to be described by a single structure and in turn, there is no single optimal secondary structure. Thus, we compute the most likely abstract shape of a given RNA sequence. Improvements of the algorithms for computing the likelihood of abstract shapes are discussed in Chapter 4, specifically with regards to computational speed (Program RapidShapes). 3. For computational complexity reasons, models of RNA structures commonly exclude crossing base-pairs, the so-called "pseudoknots", from the secondary structure. In Chapter 5, we introduce a heuristic for mastering a frequent type of pseudoknots: "kissing-hairpins" (Program pKiss). 4. In Chapter 6 we revisit the old algorithmic idea of outside-in computation for the new programming framework Bellman’s GAP. This broadens the arsenal of rapid prototyping algorithms for RNA and other sequential problems. It adds "outside" and "MEA" functionality to RNAshapes and RNAalishapes. 5. Covariance Models representing RNA families assume a single consensus secondary structure for a set of related RNAs and serve as statistical tools to search for additional members. In Chapter 7, we evaluate CM scorings that are more structurespecific than the standard sequence-to-model alignments. Furthermore, we introduce a technique to incorporate "ambivalent" consensus structures into covariance models (Program aCMs). The results of this work are available at the Bielefeld Bioinformatic Server. The RNA Studio (http://bibiserv.cebitec.uni-bielefeld.de/rna) supports ready to use web-submissions, web-services and cloud computing for the programs developed in this thesis. debian packages foster a simple way to install our software on your local machine. Developers can benefit from our algorithmic analyses or use our sources for rapid prototyping as a primer for new implementations: http://bibiserv.cebitec.uni-bielefeld.de/fold-grammars

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Accelerated probabilistic inference of RNA structure evolution

    Get PDF
    BACKGROUND: Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. RESULTS: We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. CONCLUSION: A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License
    corecore