9 research outputs found

    Asymptotics of Canonical and Saturated RNA Secondary Structures

    Get PDF
    It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366n−3/22.618034n1.104366 n^{-3/2} 2.618034^n. In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures - canonical and saturated structures. Canonical secondary structures were introduced by Bompf\"unewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we compute the asymptotic number of saturated structures, we show that the asymptotic expected number of base pairs is 0.337361n0.337361 n, and the asymptotic number of saturated stem-loop structures is 0.3239541.69562n0.323954 1.69562^n, in contrast to the number 2n−22^{n-2} of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs of is 0.340633n0.340633 n.Comment: accepted: Journal of Bioinformatics and Computational Biology (2009) 22 page

    Asymptotic structural properties of quasi-random saturated structures of RNA

    Get PDF
    Background: RNA folding depends on the distribution of kinetic traps in the landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In previous work, we investigated asymptotic combinatorics of both random saturated structures and of quasi-random saturated structures, where the latter are constructed by a natural stochastic process.Results: We prove that for quasi-random saturated structures with the uniform distribution, the asymptotic expected number of external loops is O(logn) and the asymptotic expected maximum stem length is O(logn), while under the Zipf distribution, the asymptotic expected number of external loops is O(log2n) and the asymptotic expected maximum stem length is O(logn/log logn).Conclusions: Quasi-random saturated structures are generated by a stochastic greedy method, which is simple to implement. Structural features of random saturated structures appear to resemble those of quasi-random saturated structures, and the latter appear to constitute a class for which both the generation of sampled structures as well as a combinatorial investigation of structural features may be simpler to undertake

    Computing the Partition Function for Kinetically Trapped RNA Secondary Structures

    Get PDF
    An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/

    Unfolding RNA 3D structures for secondary structure prediction benchmarking

    Full text link
    Les acides ribonucléiques (ARN) forment des structures tri-dimensionnelles complexes stabilisées par la formation de la structure secondaire (2D), elle-même formée de paires de bases. Plusieurs méthodes computationnelles ont été créées dans les dernières années afin de prédire la structure 2D d’ARNs, en partant de la séquence. Afin de simplifier le calcul, ces méthodes appliquent généralement des restrictions sur le type de paire de bases et la topologie des structures 2D prédites. Ces restrictions font en sorte qu’il est parfois difficile de savoir à quel point la totalité des paires de bases peut être représentée par ces structures 2D restreintes. MC-Unfold fut créé afin de trouver les structures 2D restreintes qui pourraient être associées à une structure secondaire complète, en fonction des restrictions communément utilisées par les méthodes de prédiction de structure secondaire. Un ensemble de 321 monomères d’ARN totalisant plus de 4223 structures fut assemblé afin d’évaluer les méthodes de prédiction de structure 2D. La majorité de ces structures ont été déterminées par résonance magnétique nucléaire et crystallographie aux rayons X. Ces structures ont été dépliés par MC-Unfold et les structures résultantes ont été comparées à celles prédites par les méthodes de prédiction. La performance de MC-Unfold sur un ensemble de structures expérimentales est encourageante. En moins de 5 minutes, 96% des 227 structures ont été complètement dépliées, le reste des structures étant trop complexes pour être déplié rapidement. Pour ce qui est des méthodes de prédiction de structure 2D, les résultats indiquent qu’elles sont capable de prédire avec un certain succès les structures expérimentales, particulièrement les petites molécules. Toutefois, si on considère les structures larges ou contenant des pseudo-noeuds, les résultats sont généralement défavorables. Les résultats obtenus indiquent que les méthodes de prédiction de structure 2D devraient être utilisées avec prudence, particulièrement pour de larges molécules.Ribonucleic acids (RNA) adopt complex three dimensional structures which are stabilized by the formation of base pairs, also known as the secondary (2D) structure. Predicting where and how many of these interactions occur has been the focus of many computational methods called 2D structure prediction algorithms. These methods disregard some interactions, which makes it difficult to know how well a 2D structure represents an RNA structure, especially when large amounts of base pairs are ignored. MC-Unfold was created to remove interactions violating the assumptions used by prediction methods. This process, named unfolding, extends previous planarization and pseudoknot removal methods. To evaluate how well computational methods can predict experimental structures, a set of 321 RNA monomers corresponding to more than 4223 experimental structures was acquired. These structures were mostly determined using nuclear magnetic resonance and X-ray crystallography. MC-Unfold was used to remove interactions the prediction algorithms were not expected to predict. These structures were then compared with the structured predicted. MC-Unfold performed very well on the test set it was given. In less than five minutes, 96% of the 227 structure could be exhaustively unfolded. The few remaining structures are very large and could not be unfolded in reasonable time. MC-Unfold is therefore a practical alternative to the current methods. As for the evaluation of prediction methods, MC-Unfold demonstrated that the computational methods do find experimental structures, especially for small molecules. However, when considering large or pseudoknotted molecules, the results are not so encouraging. As a consequence, 2D structure prediction methods should be used with caution, especially for large structures

    Combinatorics of locally optimal RNA secondary structures

    Full text link
    It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366⋅n−3/2⋅2.618034n1.104366 \cdot n^{-3/2} \cdot 2.618034^n. Motivated by the kinetics of RNA secondary structure formation, we are interested in determining the asymptotic number of secondary structures that are locally optimal, with respect to a particular energy model. In the Nussinov energy model, where each base pair contributes -1 towards the energy of the structure, locally optimal structures are exactly the saturated structures, for which we have previously shown that asymptotically, there are 1.07427⋅n−3/2⋅2.35467n1.07427\cdot n^{-3/2} \cdot 2.35467^n many saturated structures for a sequence of length nn. In this paper, we consider the base stacking energy model, a mild variant of the Nussinov model, where each stacked base pair contributes -1 toward the energy of the structure. Locally optimal structures with respect to the base stacking energy model are exactly those secondary structures, whose stems cannot be extended. Such structures were first considered by Evers and Giegerich, who described a dynamic programming algorithm to enumerate all locally optimal structures. In this paper, we apply methods from enumerative combinatorics to compute the asymptotic number of such structures. Additionally, we consider analogous combinatorial problems for secondary structures with annotated single-stranded, stacking nucleotides (dangles).Comment: 27 page

    The Folding Kinetics of RNA

    Get PDF
    RNAs are biomolecules ubiquitous in all living cells. Usually, they fold into complex molecular structures, which often mediate their biological function. In this work, models of RNA folding have been studied in detail. One can distinguish two fundamentally different approaches to RNA folding. The first one is the thermodynamic approach, which yields information about the distribution of structures in the ensemble in its equilibrium. The second approach, which is required to study the dynamics of folding during the course of time, is the kinetic folding analysis. It is much more computationally expensive, but allows to incorporate changing environmental parameters as well as time-dependent effects into the analysis. Building on these methods, the BarMap framework (Hofacker, Flamm, et al., 2010) allows to chain several pre-computed models and thus simulate folding reactions in a dynamically changing environment, e. g., to model co- transcriptional folding. However, there is no obvious way to identify spurious output, let alone assessing the quality of the simulation results. As a remedy, BarMap-QA, a semi-automatic software pipeline for the analysis of cotranscriptional folding, has been developed. For a given input sequence, it automatically generates the models for every step of the RNA elongation, applies BarMap to link them together, and runs the simulation. Post-processing scripts, visualizations, and an integrated viewer are provided to facilitate the evaluation of the unwieldy BarMap output. Three novel, complementary quality measures are computed on-the-fly, allowing the analyst to evaluate the coverage of the computed models, the exactness of the computed mapping between the individual states of each model, and the fraction of correctly mapped population during the simulation run. In case of deficiencies, the output is automatically re-rendered after parameter adjustment. Statistical evidence is presented that, even when coarse graining the ensemble, kinetic simulations quickly become infeasible for longer RNAs. However, within the individual gradient basins, most high-energy structures only have a marginal probability and could safely be excluded from the analysis. To tell relevant and irrelevant structures apart, a precise knowledge of the distribution of probability mass within a basin is necessary. Both a theoretical result concerning the shape of its density, and possible applications like the prediction of a basin’s partition function are given. To demonstrate the applicability of computational folding simulations to a real-world task of the life sciences, we conducted an in silico design process for a synthetic, transcriptional riboswitch responding to the ligand neomycin. The designed constructs were then transfected into the bacterium Escherichia coli by a collaborative partner and could successfully regulate a fluorescent reporter gene depending on the presence of its ligand. Additionally, it was shown that the sequence context of the riboswitch could have detrimental effects on its functionality, but also that RNA folding simulations are often capable to predict these interactions and provide solutions in the form of decoupling spacer elements. Taken together, this thesis offers the reader deep insights into the world of RNA folding and its models, and how these can be applied to design novel biomolecules
    corecore