8,260 research outputs found

    Transat—A Method for Detecting the Conserved Helices of Functional RNA Structures, Including Transient, Pseudo-Knotted and Alternative Structures

    Get PDF
    The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment

    Deciphering the universe of RNA structures and trans RNA-RNA interactions of transcriptomes in vivo: from experimental protocols to computational analyses

    Get PDF
    The last few years have seen an explosion of experimental and computational methods for investigating RNA structures of entire transcriptomes in vivo. Very recent experimental protocols now also allow trans RNA–RNA interactions to be probed in a transcriptome-wide manner. All of the experimental strategies require comprehensive computational pipelines for analysing the raw data and converting it back into actual RNA structure features or trans RNA–RNA interactions. The overall performance of these methods thus strongly depends on the experimental and the computational protocols employed. In order to get the best out of both worlds, both aspects need to be optimised simultaneously. This review introduced the methods and proposes ideas how they could be further improved

    Computational approaches for RNA structure ensemble deconvolution from structure probing data

    Get PDF
    RNA structure probing experiments have emerged over the last decade as a straightforward way to determine the structure of RNA molecules in a number of different contexts. Although powerful, the ability of RNA to dynamically interconvert between, and to simultaneously populate, alternative structural configurations, poses a nontrivial challenge to the interpretation of data derived from these experiments. Recent efforts aimed at developing computational methods for the reconstruction of coexisting alternative RNA conformations from structure probing data are paving the way to the study of RNA structure ensembles, even in the context of living cells. In this review, we critically discuss these methods, their limitations and possible future improvements

    TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

    Get PDF
    Abstract Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms. Conclusions TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.</p

    Revisiting Hybridization Kinetics with Improved Elementary Step Simulation

    Get PDF
    Nucleic acid strands, which react by forming and breaking Watson-Crick base pairs, can be designed to form complex nanoscale structures or devices. Controlling such systems requires accurate predictions of the reaction rate and of the folding pathways of interacting strands. Simulators such as Multistrand model these kinetic properties using continuous-time Markov chains (CTMCs), whose states and transitions correspond to secondary structures and elementary base pair changes, respectively. The transient dynamics of a CTMC are determined by a kinetic model, which assigns transition rates to pairs of states, and the rate of a reaction can be estimated using the mean first passage time (MFPT) of its CTMC. However, use of Multistrand is limited by its slow runtime, particularly on rare events, and the quality of its rate predictions is compromised by a poorly-calibrated and simplistic kinetic model. The former limitation can be addressed by constructing truncated CTMCs, which only include a small subset of states and transitions, selected either manually or through simulation. As a first step to address the latter limitation, Bayesian posterior inference in an Arrhenius-type kinetic model was performed in earlier work, using a small experimental dataset of DNA reaction rates and a fixed set of manually truncated CTMCs, which we refer to as Assumed Pathway (AP) state spaces. In this work we extend this approach, by introducing a new prior model that is directly motivated by the physical meaning of the parameters and that is compatible with experimental measurements of elementary rates, and by using a larger dataset of 1105 reactions as well as larger truncated state spaces obtained from the recently introduced stochastic Pathway Elaboration (PE) method. We assess the quality of the resulting posterior distribution over kinetic parameters, as well as the quality of the posterior reaction rates predicted using AP and PE state spaces. Finally, we use the newly parameterised PE state spaces and Multistrand simulations to investigate the strong variation of helix hybridization reaction rates in a dataset of Hata et al. While we find strong evidence for the nucleation-zippering model of hybridization, in the classical sense that the rate-limiting phase is composed of elementary steps reaching a small "nucleus" of critical stability, the strongly sequence-dependent structure of the trajectory ensemble up to nucleation appears to be much richer than assumed in the model by Hata et al. In particular, rather than being dominated by the collision probability of nucleation sites, the trajectory segment between first binding and nucleation tends to visit numerous secondary structures involving misnucleation and hairpins, and has a sizeable effect on the probability of overcoming the nucleation barrier

    Computational investigations of structure probing experiments for RNA structure prediction

    Get PDF
    Ribonucleic acids (RNA) transcripts, and in particular non-coding RNAs, play fundamental roles in cellular metabolism, as they are involved in protein synthesis, catalysis, and regulation of gene expression. In some cases, an RNA\u2019s biological function is mostly dependent on a specific active conformation, making the identification of this single stable structure crucial to identify the role of the RNA and the relationships between its mutations and diseases. On the other hand, RNAs are often found in a dynamic equilibrium of multiple interconverting conformations, that is necessary to regulate their functional activity. In these cases it becomes fundamental to gain knowledge of RNA\u2019s structural ensembles, in order to fully determine its mechanism of action. The current structure determination techniques, both for single-state models such as X-ray crystallography, and for multi-state models such as nuclear magnetic resonance and single-molecule methods, despite proving accurate and reliable in many cases, are extremely slow and costly. In contrast, chemical probing is a class of experimental techniques that provide structural information at single-nucleotide resolution at significantly lower costs in terms of time and required infrastructures. In particular, selective 2\u2032 hydroxyl acylation analyzed via primer extension (SHAPE) has proved a valid chemical mapping technique to probe RNA structure even in vivo. This thesis reports a systematic investi- gation of chemical probing experiments based on two different approaches. The first approach, presented in Chapter 2, relies on machine-learning techniques to optimize a model for mapping experimental data into structural information. The model relies also on co-evolutionary data, in the form of direct coupling analysis (DCA) couplings. The inclusion of this kind of data is chosen in the same spirit of reducing the costs of structure probing, as co-evolutionary analysis relies only on sequencing techniques. The resulting model is proposed as a candidate standard tool for prediction of RNA secondary structure, and some insight in the mechanism of chemical probing is gained by interpreting back its features. Importantly, this work has been developed in the per- spective of building a framework for future refinement and improvement. In this spirit, all the used data and scripts are available at https://github.com/bussilab/shape-dca-data, and the model can be easily retrained and adapted to incorporate arbitrary experimental informa- tion. As the interpretation of the model features suggests the possible emergence of cooperative effects involving RNA nucleotides interacting with SHAPE reagents, a second approach based on Molecular Dynamics simulations is proposed to investigate this hypothesis. The results, along with an originally developed methodology to analyse Molecular Dynamics simulations at variable number of particles, are presented in Chapter 3