1,322 research outputs found
A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots
This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms
Correlation between nucleotide composition and folding energy of coding sequences with special attention to wobble bases
Background: The secondary structure and complexity of mRNA influences its
accessibility to regulatory molecules (proteins, micro-RNAs), its stability and
its level of expression. The mobile elements of the RNA sequence, the wobble
bases, are expected to regulate the formation of structures encompassing coding
sequences.
Results: The sequence/folding energy (FE) relationship was studied by
statistical, bioinformatic methods in 90 CDS containing 26,370 codons. I found
that the FE (dG) associated with coding sequences is significant and negative
(407 kcal/1000 bases, mean +/- S.E.M.) indicating that these sequences are able
to form structures. However, the FE has only a small free component, less than
10% of the total. The contribution of the 1st and 3rd codon bases to the FE is
larger than the contribution of the 2nd (central) bases. It is possible to
achieve a ~ 4-fold change in FE by altering the wobble bases in synonymous
codons. The sequence/FE relationship can be described with a simple algorithm,
and the total FE can be predicted solely from the sequence composition of the
nucleic acid. The contributions of different synonymous codons to the FE are
additive and one codon cannot replace another. The accumulated contributions of
synonymous codons of an amino acid to the total folding energy of an mRNA is
strongly correlated to the relative amount of that amino acid in the translated
protein.
Conclusion: Synonymous codons are not interchangable with regard to their
role in determining the mRNA FE and the relative amounts of amino acids in the
translated protein, even if they are indistinguishable in respect of amino acid
coding.Comment: 14 pages including 6 figures and 1 tabl
Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences
Questions in computational molecular biology generate various discrete
optimization problems, such as DNA sequence alignment and RNA secondary
structure prediction. However, the optimal solutions are fundamentally
dependent on the parameters used in the objective functions. The goal of a
parametric analysis is to elucidate such dependencies, especially as they
pertain to the accuracy and robustness of the optimal solutions. Techniques
from geometric combinatorics, including polytopes and their normal fans, have
been used previously to give parametric analyses of simple models for DNA
sequence alignment and RNA branching configurations. Here, we present a new
computational framework, and proof-of-principle results, which give the first
complete parametric analysis of the branching portion of the nearest neighbor
thermodynamic model for secondary structure prediction for real RNA sequences.Comment: 17 pages, 8 figure
A Graph Grammar for Modelling RNA Folding
We propose a new approach for modelling the process of RNA folding as a graph
transformation guided by the global value of free energy. Since the folding
process evolves towards a configuration in which the free energy is minimal,
the global behaviour resembles the one of a self-adaptive system. Each RNA
configuration is a graph and the evolution of configurations is constrained by
precise rules that can be described by a graph grammar.Comment: In Proceedings GaM 2016, arXiv:1612.0105
The RNA Newton Polytope and Learnability of Energy Parameters
Despite nearly two scores of research on RNA secondary structure and RNA-RNA
interaction prediction, the accuracy of the state-of-the-art algorithms are
still far from satisfactory. Researchers have proposed increasingly complex
energy models and improved parameter estimation methods in anticipation of
endowing their methods with enough power to solve the problem. The output has
disappointingly been only modest improvements, not matching the expectations.
Even recent massively featured machine learning approaches were not able to
break the barrier. In this paper, we introduce the notion of learnability of
the parameters of an energy model as a measure of its inherent capability. We
say that the parameters of an energy model are learnable iff there exists at
least one set of such parameters that renders every known RNA structure to date
the minimum free energy structure. We derive a necessary condition for the
learnability and give a dynamic programming algorithm to assess it. Our
algorithm computes the convex hull of the feature vectors of all feasible
structures in the ensemble of a given input sequence. Interestingly, that
convex hull coincides with the Newton polytope of the partition function as a
polynomial in energy parameters. We demonstrated the application of our theory
to a simple energy model consisting of a weighted count of A-U and C-G base
pairs. Our results show that this simple energy model satisfies the necessary
condition for less than one third of the input unpseudoknotted
sequence-structure pairs chosen from the RNA STRAND v2.0 database. For another
one third, the necessary condition is barely violated, which suggests that
augmenting this simple energy model with more features such as the Turner loops
may solve the problem. The necessary condition is severely violated for 8%,
which provides a small set of hard cases that require further investigation
Paradigms for computational nucleic acid design
The design of DNA and RNA sequences is critical for many endeavors, from DNA nanotechnology, to PCR‐based applications, to DNA hybridization arrays. Results in the literature rely on a wide variety of design criteria adapted to the particular requirements of each application. Using an extensively studied thermodynamic model, we perform a detailed study of several criteria for designing sequences intended to adopt a target secondary structure. We conclude that superior design methods should explicitly implement both a positive design paradigm (optimize affinity for the target structure) and a negative design paradigm (optimize specificity for the target structure). The commonly used approaches of sequence symmetry minimization and minimum free‐energy satisfaction primarily implement negative design and can be strengthened by introducing a positive design component. Surprisingly, our findings hold for a wide range of secondary structures and are robust to modest perturbation of the thermodynamic parameters used for evaluating sequence quality, suggesting the feasibility and ongoing utility of a unified approach to nucleic acid design as parameter sets are refined further. Finally, we observe that designing for thermodynamic stability does not determine folding kinetics, emphasizing the opportunity for extending design criteria to target kinetic features of the energy landscape
- …