355 research outputs found

    Paradigms for computational nucleic acid design

    Get PDF
    The design of DNA and RNA sequences is critical for many endeavors, from DNA nanotechnology, to PCR‐based applications, to DNA hybridization arrays. Results in the literature rely on a wide variety of design criteria adapted to the particular requirements of each application. Using an extensively studied thermodynamic model, we perform a detailed study of several criteria for designing sequences intended to adopt a target secondary structure. We conclude that superior design methods should explicitly implement both a positive design paradigm (optimize affinity for the target structure) and a negative design paradigm (optimize specificity for the target structure). The commonly used approaches of sequence symmetry minimization and minimum free‐energy satisfaction primarily implement negative design and can be strengthened by introducing a positive design component. Surprisingly, our findings hold for a wide range of secondary structures and are robust to modest perturbation of the thermodynamic parameters used for evaluating sequence quality, suggesting the feasibility and ongoing utility of a unified approach to nucleic acid design as parameter sets are refined further. Finally, we observe that designing for thermodynamic stability does not determine folding kinetics, emphasizing the opportunity for extending design criteria to target kinetic features of the energy landscape

    Thermodynamic Analysis of Interacting Nucleic Acid Strands

    Get PDF
    Motivated by the analysis of natural and engineered DNA and RNA systems, we present the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple interacting nucleic acid strands. This dynamic program is based on a rigorous extension of secondary structure models to the multistranded case, addressing representation and distinguishability issues that do not arise for single-stranded structures. We then derive the form of the partition function for a fixed volume containing a dilute solution of nucleic acid complexes. This expression can be evaluated explicitly for small numbers of strands, allowing the calculation of the equilibrium population distribution for each species of complex. Alternatively, for large systems (e.g., a test tube), we show that the unique complex concentrations corresponding to thermodynamic equilibrium can be obtained by solving a convex programming problem. Partition function and concentration information can then be used to calculate equilibrium base-pairing observables. The underlying physics and mathematical formulation of these problems lead to an interesting blend of approaches, including ideas from graph theory, group theory, dynamic programming, combinatorics, convex optimization, and Lagrange duality

    RNAalifold: improved consensus structure prediction for RNA alignments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach.</p> <p>Results</p> <p>We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets.</p> <p>Conclusion</p> <p>The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.</p

    Computational Investigations on Polymerase Actions in Gene Transcription and Replication Combining Physical Modeling and Atomistic Simulations

    Full text link
    Polymerases are protein enzymes that move along nucleic acid chains and catalyze template-based polymerization reactions during gene transcription and replication. The polymerases also substantially improve transcription or replication fidelity through the non-equilibrium enzymatic cycles. We briefly review computational efforts that have been made toward understanding mechano-chemical coupling and fidelity control mechanisms of the polymerase elongation. The polymerases are regarded as molecular information motors during the elongation process. It requires a full spectrum of computational approaches from multiple time and length scales to understand the full polymerase functional cycle. We keep away from quantum mechanics based approaches to the polymerase catalysis due to abundant former surveys, while address only statistical physics modeling approach and all-atom molecular dynamics simulation approach. We organize this review around our own modeling and simulation practices on a single-subunit T7 RNA polymerase, and summarize commensurate studies on structurally similar DNA polymerases. For multi-subunit RNA polymerases that have been intensively studied in recent years, we leave detailed discussions on the simulation achievements to other computational chemical surveys, while only introduce very recently published representative studies, including our own preliminary work on structure-based modeling on yeast RNA polymerase II. In the end, we quickly go through kinetic modeling on elongation pauses and backtracking activities. We emphasize the fluctuation and control mechanisms of the polymerase actions, highlight the non-equilibrium physical nature of the system, and try to bring some perspectives toward understanding replication and transcription regulation from single molecular details to a genome-wide scale

    A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms

    Get PDF
    We extend an hypergraph representation, introduced by Finkelstein and Roytberg, to unify dynamic programming algorithms in the context of RNA folding with pseudoknots. Classic applications of RNA dynamic programming energy minimization, partition function, base-pair probabilities...) are reformulated within this framework, giving rise to very simple algorithms. This reformulation allows one to conceptually detach the conformation space/energy model -- captured by the hypergraph model -- from the specific application, assuming unambiguity of the decomposition. To ensure the latter property, we propose a new combinatorial methodology based on generating functions. We extend the set of generic applications by proposing an exact algorithm for extracting generalized moments in weighted distribution, generalizing a prior contribution by Miklos and al. Finally, we illustrate our full-fledged programme on three exemplary conformation spaces (secondary structures, Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets of algorithms that are either novel or have complexity comparable to classic implementations for minimization and Boltzmann ensemble applications of dynamic programming

    Safe and Complete Prediction of RNA Secondary Structure

    Get PDF
    Ribonucleic acid, RNA, is an essential type of molecule for all known forms of life. It is a nucleic acid, like DNA. However, where DNA appears as two complementary strands that join and twist into a double helix structure, RNA has only a single strand. This strand can fold upon itself, pairing complementary bases. The resulting set of base pairs is the RNA secondary structure, also known as folding. It is typical that a prediction algorithm gives a large number of optimal or near-optimal foldings for an RNA sequence. Only in the simplest cases it is possible to manually go through all of these foldings, and in hard cases it is infeasible to even generate the full set of optimal foldings. In fact, we observe that the number of optimal foldings may be exponential in the sequence length, and that some naturally occurring RNA sequences of 2000–3000 bases in length have well over 10^100 optimal foldings, under the model of maximizing the number of base pairs. To help analyze the full set of optimal foldings, we apply the concept of safe and complete algorithms. In the presence of multiple optimal solutions, any partial solution that appears in all optimal solutions is called a safe part, and a safe and complete algorithm finds all of the safe parts. We show a trivial safe and complete algorithm that computes safety by going through the full set of optimal foldings. However, this algorithm is only practical for short RNA sequences that do not have too many optimal foldings. In order to analyze the harder RNA sequences, we develop and implement a novel polynomial-time safe and complete algorithm for RNA secondary structure prediction, using the model of maximizing base pairs. Using the dynamic programming approach, this new algorithm can compute how often each base pair and unpaired base appears in the full set of optimal foldings without having to produce the actual foldings. Our experimental evaluation shows that the safe parts of a folding are more likely to be biologically correct than the non-safe parts. We observe this both by using our implementation of the efficient safe and complete algorithm and by combining an existing predictor program with the trivial algorithm. As this existing predictor uses a modern minimum free energy model for predicting the RNA foldings, tests using this combination show that safety is a useful property, even beyond the simple maximum pairs model in our implementation
    • 

    corecore