2,686 research outputs found

    Thermodynamic Analysis of Interacting Nucleic Acid Strands

    Get PDF
    Motivated by the analysis of natural and engineered DNA and RNA systems, we present the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple interacting nucleic acid strands. This dynamic program is based on a rigorous extension of secondary structure models to the multistranded case, addressing representation and distinguishability issues that do not arise for single-stranded structures. We then derive the form of the partition function for a fixed volume containing a dilute solution of nucleic acid complexes. This expression can be evaluated explicitly for small numbers of strands, allowing the calculation of the equilibrium population distribution for each species of complex. Alternatively, for large systems (e.g., a test tube), we show that the unique complex concentrations corresponding to thermodynamic equilibrium can be obtained by solving a convex programming problem. Partition function and concentration information can then be used to calculate equilibrium base-pairing observables. The underlying physics and mathematical formulation of these problems lead to an interesting blend of approaches, including ideas from graph theory, group theory, dynamic programming, combinatorics, convex optimization, and Lagrange duality

    The RNA Newton Polytope and Learnability of Energy Parameters

    Full text link
    Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. Researchers have proposed increasingly complex energy models and improved parameter estimation methods in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of its inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U and C-G base pairs. Our results show that this simple energy model satisfies the necessary condition for less than one third of the input unpseudoknotted sequence-structure pairs chosen from the RNA STRAND v2.0 database. For another one third, the necessary condition is barely violated, which suggests that augmenting this simple energy model with more features such as the Turner loops may solve the problem. The necessary condition is severely violated for 8%, which provides a small set of hard cases that require further investigation

    A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed

    Get PDF
    Dynamic programming algorithms within the NUPACK software suite enable analysis of nucleic acid sequences over complex and test tube ensembles containing arbitrary numbers of interacting strand species, serving the needs of researchers in molecular programming, nucleic acid nanotechnology, synthetic biology, and across the life sciences. Here, to enhance the underlying physical model, ensure scalability for large calculations, and achieve dramatic speedups when calculating diverse physical quantities over complex and test tube ensembles, we introduce a unified dynamic programming framework that combines three ingredients: (1) recursions that specify the dependencies between subproblems and incorporate the details of the structural ensemble and the free energy model, (2) evaluation algebras that define the mathematical form of each subproblem, (3) operation orders that specify the computational trajectory through the dependency graph of subproblems. The physical model is enhanced using new recursions that operate over the complex ensemble including coaxial and dangle stacking subensembles. The recursions are coded generically and then compiled with a quantity-specific evaluation algebra and operation order to generate an executable for each physical quantity: partition function, equilibrium base-pairing probabilities, MFE energy and proxy structure, suboptimal proxy structures, and Boltzmann sampled structures. For large complexes (e.g., 30 000 nt), scalability is achieved for partition function calculations using an overflow-safe evaluation algebra, and for equilibrium base-pairing probabilities using a backtrack-free operation order. A new blockwise operation order that treats subcomplex blocks for the complex species in a test tube ensemble enables dramatic speedups (e.g., 20–120× ) using vectorization and caching. With these performance enhancements, equilibrium analysis of substantial test tube ensembles can be performed in ≤ 1 min on a single computational core (e.g., partition function and equilibrium concentration for all complex species of up to six strands formed from two strand species of 300 nt each, or for all complex species of up to two strands formed from 80 strand species of 100 nt each). A new sampling algorithm simultaneously samples multiple structures from the complex ensemble to yield speedups of an order of magnitude or more as the number of structures increases above ≈10³. These advances are available within the NUPACK 4.0 code base (www.nupack.org) which can be flexibly scripted using the all-new NUPACK Python module

    Predicting Minimum Free Energy Structures of Multi-Stranded Nucleic Acid Complexes Is APX-Hard

    Get PDF
    Given multiple nucleic acid strands, what is the minimum free energy (MFE) secondary structure that they can form? As interacting nucleic acid strands are the basis for DNA computing and molecular programming, e.g., in DNA self-assembly and DNA strand displacement systems, determining the MFE structure is an important step in the design and verification of these systems. Efficient dynamic programming algorithms are well known for predicting the MFE pseudoknot-free secondary structure of a single nucleic acid strand. In contrast, we prove that for a simple energy model, the problem of predicting the MFE pseudoknot-free secondary structure formed from multiple interacting nucleic acid strands is NP-hard and also APX-hard. The latter result implies that there does not exist a polynomial time approximation scheme for this problem, unless ? = NP, and it suggests that heuristic methods should be investigated

    Fast prediction of RNA-RNA interaction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Regulatory antisense RNAs are a class of ncRNAs that regulate gene expression by prohibiting the translation of an mRNA by establishing stable interactions with a target sequence. There is great demand for efficient computational methods to predict the specific interaction between an ncRNA and its target mRNA(s). There are a number of algorithms in the literature which can predict a variety of such interactions - unfortunately at a very high computational cost. Although some existing target prediction approaches are much faster, they are specialized for interactions with a single binding site.</p> <p>Methods</p> <p>In this paper we present a novel algorithm to accurately predict the minimum free energy structure of RNA-RNA interaction under the most general type of interactions studied in the literature. Moreover, we introduce a fast heuristic method to predict the specific (multiple) binding sites of two interacting RNAs.</p> <p>Results</p> <p>We verify the performance of our algorithms for joint structure and binding site prediction on a set of known interacting RNA pairs. Experimental results show our algorithms are highly accurate and outperform all competitive approaches.</p

    Minimum Free Energy, Partition Function and Kinetics Simulation Algorithms for a Multistranded Scaffolded DNA Computer

    Get PDF
    Polynomial time dynamic programming algorithms play a crucial role in the design, analysis and engineering of nucleic acid systems including DNA computers and DNA/RNA nanostructures. However, in complex multistranded or pseudoknotted systems, computing the minimum free energy (MFE), and partition function of nucleic acid systems is NP-hard. Despite this, multistranded and/or pseudoknotted systems represent some of the most utilised and successful systems in the field. This leaves open the tempting possibility that many of the kinds of multistranded and/or pseudoknotted systems we wish to engineer actually fall into restricted classes, that do in fact have polynomial time algorithms, but we\u27ve just not found them yet. Here, we give polynomial time algorithms for MFE and partition function calculation for a restricted kind of multistranded system called the 1D scaffolded DNA computer. This model of computation thermodynamically favours correct outputs over erroneous states, simulates finite state machines in 1D and Boolean circuits in 2D, and is amenable to DNA storage applications. In an effort to begin to ask the question of whether we can naturally compare the expressivity of nucleic acid systems based on the computational complexity of prediction of their preferred energetic states, we show our MFE problem is in logspace (the complexity class L), making it perhaps one of the simplest known, natural, nucleic acid MFE problems. Finally, we provide a stochastic kinetic simulator for the 1D scaffolded DNA computer and evaluate strategies for efficiently speeding up this thermodynamically favourable system in a constant-temperature kinetic regime
    corecore