355 research outputs found
Paradigms for computational nucleic acid design
The design of DNA and RNA sequences is critical for many endeavors, from DNA nanotechnology, to PCRâbased applications, to DNA hybridization arrays. Results in the literature rely on a wide variety of design criteria adapted to the particular requirements of each application. Using an extensively studied thermodynamic model, we perform a detailed study of several criteria for designing sequences intended to adopt a target secondary structure. We conclude that superior design methods should explicitly implement both a positive design paradigm (optimize affinity for the target structure) and a negative design paradigm (optimize specificity for the target structure). The commonly used approaches of sequence symmetry minimization and minimum freeâenergy satisfaction primarily implement negative design and can be strengthened by introducing a positive design component. Surprisingly, our findings hold for a wide range of secondary structures and are robust to modest perturbation of the thermodynamic parameters used for evaluating sequence quality, suggesting the feasibility and ongoing utility of a unified approach to nucleic acid design as parameter sets are refined further. Finally, we observe that designing for thermodynamic stability does not determine folding kinetics, emphasizing the opportunity for extending design criteria to target kinetic features of the energy landscape
Thermodynamic Analysis of Interacting Nucleic Acid Strands
Motivated by the analysis of natural and engineered DNA and RNA systems, we present the first algorithm for calculating the partition function of an unpseudoknotted complex of multiple interacting nucleic acid strands. This dynamic program is based on a rigorous extension of secondary structure models to the multistranded case, addressing representation and distinguishability issues that do not arise for single-stranded structures. We then derive the form of the partition function for a fixed volume containing a dilute solution of nucleic acid complexes. This expression can be evaluated explicitly for small numbers of strands, allowing the calculation of the equilibrium population distribution for each species of complex. Alternatively, for large systems (e.g., a test tube), we show that the unique complex concentrations corresponding to thermodynamic equilibrium can be obtained by solving a convex programming problem. Partition function and concentration information can then be used to calculate equilibrium base-pairing observables. The underlying physics and mathematical formulation of these problems lead to an interesting blend of approaches, including ideas from graph theory, group theory, dynamic programming, combinatorics, convex optimization, and Lagrange duality
RNAalifold: improved consensus structure prediction for RNA alignments
<p>Abstract</p> <p>Background</p> <p>The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach.</p> <p>Results</p> <p>We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets.</p> <p>Conclusion</p> <p>The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.</p
Computational Investigations on Polymerase Actions in Gene Transcription and Replication Combining Physical Modeling and Atomistic Simulations
Polymerases are protein enzymes that move along nucleic acid chains and
catalyze template-based polymerization reactions during gene transcription and
replication. The polymerases also substantially improve transcription or
replication fidelity through the non-equilibrium enzymatic cycles. We briefly
review computational efforts that have been made toward understanding
mechano-chemical coupling and fidelity control mechanisms of the polymerase
elongation. The polymerases are regarded as molecular information motors during
the elongation process. It requires a full spectrum of computational approaches
from multiple time and length scales to understand the full polymerase
functional cycle. We keep away from quantum mechanics based approaches to the
polymerase catalysis due to abundant former surveys, while address only
statistical physics modeling approach and all-atom molecular dynamics
simulation approach. We organize this review around our own modeling and
simulation practices on a single-subunit T7 RNA polymerase, and summarize
commensurate studies on structurally similar DNA polymerases. For multi-subunit
RNA polymerases that have been intensively studied in recent years, we leave
detailed discussions on the simulation achievements to other computational
chemical surveys, while only introduce very recently published representative
studies, including our own preliminary work on structure-based modeling on
yeast RNA polymerase II. In the end, we quickly go through kinetic modeling on
elongation pauses and backtracking activities. We emphasize the fluctuation and
control mechanisms of the polymerase actions, highlight the non-equilibrium
physical nature of the system, and try to bring some perspectives toward
understanding replication and transcription regulation from single molecular
details to a genome-wide scale
A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms
We extend an hypergraph representation, introduced by Finkelstein and
Roytberg, to unify dynamic programming algorithms in the context of RNA folding
with pseudoknots. Classic applications of RNA dynamic programming energy
minimization, partition function, base-pair probabilities...) are reformulated
within this framework, giving rise to very simple algorithms. This
reformulation allows one to conceptually detach the conformation space/energy
model -- captured by the hypergraph model -- from the specific application,
assuming unambiguity of the decomposition. To ensure the latter property, we
propose a new combinatorial methodology based on generating functions. We
extend the set of generic applications by proposing an exact algorithm for
extracting generalized moments in weighted distribution, generalizing a prior
contribution by Miklos and al. Finally, we illustrate our full-fledged
programme on three exemplary conformation spaces (secondary structures,
Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets
of algorithms that are either novel or have complexity comparable to classic
implementations for minimization and Boltzmann ensemble applications of dynamic
programming
Safe and Complete Prediction of RNA Secondary Structure
Ribonucleic acid, RNA, is an essential type of molecule for all known forms of life. It is a nucleic acid, like DNA. However, where DNA appears as two complementary strands that join and twist into a double helix structure, RNA has only a single strand. This strand can fold upon itself, pairing complementary bases. The resulting set of base pairs is the RNA secondary structure, also known as folding.
It is typical that a prediction algorithm gives a large number of optimal or near-optimal foldings for an RNA sequence. Only in the simplest cases it is possible to manually go through all of these foldings, and in hard cases it is infeasible to even generate the full set of optimal foldings. In fact, we observe that the number of optimal foldings may be exponential in the sequence length, and that some naturally occurring RNA sequences of 2000â3000 bases in length have well over 10^100 optimal foldings, under the model of maximizing the number of base pairs.
To help analyze the full set of optimal foldings, we apply the concept of safe and complete algorithms. In the presence of multiple optimal solutions, any partial solution that appears in all optimal solutions is called a safe part, and a safe and complete algorithm finds all of the safe parts.
We show a trivial safe and complete algorithm that computes safety by going through the full set of optimal foldings. However, this algorithm is only practical for short RNA sequences that do not have too many optimal foldings. In order to analyze the harder RNA sequences, we develop and implement a novel polynomial-time safe and complete algorithm for RNA secondary structure prediction, using the model of maximizing base pairs. Using the dynamic programming approach, this new algorithm can compute how often each base pair and unpaired base appears in the full set of optimal foldings without having to produce the actual foldings.
Our experimental evaluation shows that the safe parts of a folding are more likely to be biologically correct than the non-safe parts. We observe this both by using our implementation of the efficient safe and complete algorithm and by combining an existing predictor program with the trivial algorithm. As this existing predictor uses a modern minimum free energy model for predicting the RNA foldings, tests using this combination show that safety is a useful property, even beyond the simple maximum pairs model in our implementation
- âŠ