12,640 research outputs found

    The RNA Newton Polytope and Learnability of Energy Parameters

    Full text link
    Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. Researchers have proposed increasingly complex energy models and improved parameter estimation methods in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of its inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U and C-G base pairs. Our results show that this simple energy model satisfies the necessary condition for less than one third of the input unpseudoknotted sequence-structure pairs chosen from the RNA STRAND v2.0 database. For another one third, the necessary condition is barely violated, which suggests that augmenting this simple energy model with more features such as the Turner loops may solve the problem. The necessary condition is severely violated for 8%, which provides a small set of hard cases that require further investigation

    Thermodynamic Analysis of Interacting Nucleic Acid Strands

    Get PDF
    Motivated by the analysis of natural and engineered DNA and RNA systems, we present the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple interacting nucleic acid strands. This dynamic program is based on a rigorous extension of secondary structure models to the multistranded case, addressing representation and distinguishability issues that do not arise for single-stranded structures. We then derive the form of the partition function for a fixed volume containing a dilute solution of nucleic acid complexes. This expression can be evaluated explicitly for small numbers of strands, allowing the calculation of the equilibrium population distribution for each species of complex. Alternatively, for large systems (e.g., a test tube), we show that the unique complex concentrations corresponding to thermodynamic equilibrium can be obtained by solving a convex programming problem. Partition function and concentration information can then be used to calculate equilibrium base-pairing observables. The underlying physics and mathematical formulation of these problems lead to an interesting blend of approaches, including ideas from graph theory, group theory, dynamic programming, combinatorics, convex optimization, and Lagrange duality

    A new procedure to analyze RNA non-branching structures

    Get PDF
    RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

    A Graph Grammar for Modelling RNA Folding

    Full text link
    We propose a new approach for modelling the process of RNA folding as a graph transformation guided by the global value of free energy. Since the folding process evolves towards a configuration in which the free energy is minimal, the global behaviour resembles the one of a self-adaptive system. Each RNA configuration is a graph and the evolution of configurations is constrained by precise rules that can be described by a graph grammar.Comment: In Proceedings GaM 2016, arXiv:1612.0105

    Paradigms for computational nucleic acid design

    Get PDF
    The design of DNA and RNA sequences is critical for many endeavors, from DNA nanotechnology, to PCR‐based applications, to DNA hybridization arrays. Results in the literature rely on a wide variety of design criteria adapted to the particular requirements of each application. Using an extensively studied thermodynamic model, we perform a detailed study of several criteria for designing sequences intended to adopt a target secondary structure. We conclude that superior design methods should explicitly implement both a positive design paradigm (optimize affinity for the target structure) and a negative design paradigm (optimize specificity for the target structure). The commonly used approaches of sequence symmetry minimization and minimum free‐energy satisfaction primarily implement negative design and can be strengthened by introducing a positive design component. Surprisingly, our findings hold for a wide range of secondary structures and are robust to modest perturbation of the thermodynamic parameters used for evaluating sequence quality, suggesting the feasibility and ongoing utility of a unified approach to nucleic acid design as parameter sets are refined further. Finally, we observe that designing for thermodynamic stability does not determine folding kinetics, emphasizing the opportunity for extending design criteria to target kinetic features of the energy landscape

    Prediction of RNA pseudoknots by Monte Carlo simulations

    Full text link
    In this paper we consider the problem of RNA folding with pseudoknots. We use a graphical representation in which the secondary structures are described by planar diagrams. Pseudoknots are identified as non-planar diagrams. We analyze the non-planar topologies of RNA structures and propose a classification of RNA pseudoknots according to the minimal genus of the surface on which the RNA structure can be embedded. This classification provides a simple and natural way to tackle the problem of RNA folding prediction in presence of pseudoknots. Based on that approach, we describe a Monte Carlo algorithm for the prediction of pseudoknots in an RNA molecule.Comment: 22 pages, 14 figure

    Exact Learning of RNA Energy Parameters From Structure

    Full text link
    We consider the problem of exact learning of parameters of a linear RNA energy model from secondary structure data. A necessary and sufficient condition for learnability of parameters is derived, which is based on computing the convex hull of union of translated Newton polytopes of input sequences. The set of learned energy parameters is characterized as the convex cone generated by the normal vectors to those facets of the resulting polytope that are incident to the origin. In practice, the sufficient condition may not be satisfied by the entire training data set; hence, computing a maximal subset of training data for which the sufficient condition is satisfied is often desired. We show that problem is NP-hard in general for an arbitrary dimensional feature space. Using a randomized greedy algorithm, we select a subset of RNA STRAND v2.0 database that satisfies the sufficient condition for separate A-U, C-G, G-U base pair counting model. The set of learned energy parameters includes experimentally measured energies of A-U, C-G, and G-U pairs; hence, our parameter set is in agreement with the Turner parameters

    Model-guided design of ligand-regulated RNAi for programmable control of gene expression

    Get PDF
    Progress in constructing biological networks will rely on the development of more advanced components that can be predictably modified to yield optimal system performance. We have engineered an RNA-based platform, which we call an shRNA switch, that provides for integrated ligand control of RNA interference (RNAi) by modular coupling of an aptamer, competing strand, and small hairpin (sh) RNA stem into a single component that links ligand concentration and target gene expression levels. A combined experimental and mathematical modelling approach identified multiple tuning strategies and moves towards a predictable framework for the forward design of shRNA switches. The utility of our platform is highlighted by the demonstration of fine-tuning, multi-input control, and model-guided design of shRNA switches with an optimized dynamic range. Thus, shRNA switches can serve as an advanced component for the construction of complex biological systems and offer a controlled means of activating RNAi in disease therapeutics
    corecore