547 research outputs found

    Computing folding pathways between RNA secondary structures

    Get PDF
    Given an RNA sequence and two designated secondary structures A, B, we describe a new algorithm that computes a nearly optimal folding pathway from A to B. The algorithm, RNAtabupath, employs a tabu semi-greedy heuristic, known to be an effective search strategy in combinatorial optimization. Folding pathways, sometimes called routes or trajectories, are computed by RNAtabupath in a fraction of the time required by the barriers program of Vienna RNA Package. We benchmark RNAtabupath with other algorithms to compute low energy folding pathways between experimentally known structures of several conformational switches. The RNApathfinder web server, source code for algorithms to compute and analyze pathways and supplementary data are available at http://bioinformatics.bc.edu/clotelab/RNApathfinder

    Predicting folding pathways between RNA conformational structures guided by RNA stacks

    Get PDF
    Background: Accurately predicting low energy barrier folding pathways between conformational secondary structures of an RNA molecule can provide valuable information for understanding its catalytic and regulatory functions. Most existing heuristic algorithms guide the construction of folding pathways by free energies of intermediate structures in the next move during the folding. However due to the size and ruggedness of RNA energy landscape, energy-guided search can become trapped in local optima. Results: In this paper, we propose an algorithm that guides the construction of folding pathways through the formation and destruction of RNA stacks. Guiding the construction of folding pathways by coarse grained movements of RNA stacks can help reduce the search space and make it easier to jump out of local optima. RNAEAPath is able to find lower energy barrier folding pathways between secondary structures of conformational switches and outperforms the existing heuristic algorithms in most test cases. Conclusions: RNAEAPath provides an alternate approach for predicting low-barrier folding pathways between RNA conformational secondary structures. The source code of RNAEAPath and the test data sets are available at http://genome.ucf.edu/RNAEAPath

    Interruptional Activity and Simulation of Transposable Elements

    Get PDF
    Transposable elements (TEs) are interspersed DNA sequences that can move or copy to new positions within a genome. The active TEs along with the remnants of many transposition events over millions of years constitute 46.69% of the human genome. TEs are believed to promote speciation and their activities play a significant role in human disease. The 22 AluY and 6 AluS TE subfamilies have been the most active TEs in recent human history, whose transposition has been implicated in several inherited human diseases and in various forms of cancer by integrating into genes. Therefore, understanding the transposition activities is very important. Recently, there has been some work done to quantify the activity levels of active Alu transposable elements based on variation in the sequence. Here, given this activity data, an analysis of TE activity based on the position of mutations is conducted. Two different methods/simulations are created to computationally predict so-called harmful mutation regions in the consensus sequence of a TE; that is, mutations that occur in these regions decrease the transposition activities dramatically. The methods are applied to AluY, the youngest and most active Alu subfamily, to identify the harmful regions laying in its consensus, and verifications are presented using the activity of AluY elements and the secondary structure of the AluYa5 RNA, providing evidence that the method is successfully identifying harmful mutation regions. A supplementary simulation also shows that the identified harmful regions covering the AluYa5 RNA functional regions are not occurring by chance. Therefore, mutations within the harmful regions alter the mobile activity levels of active AluY elements. One of the methods is then applied to two additional TE families: the Alu family and L1 family, in detecting the harmful regions in these elements computationally. Understanding and predicting the evolution of these TEs is of interest in understanding their powerful evolutionary force in shaping their host genomes. In this thesis, a formal model of TE fragments and their interruptions is devised that provides definitions that are compatible with biological nomenclature, while still providing a suitable formal foundation for computational analysis. Essentially, this model is used for fixing terminology that was misleading in the literature, and it helps to describe further TE problems in a precise way. Indeed, later chapters include two other models built on top of this model: the sequential interruption model and the recursive interruption model, both used to analyze their activity throughout evolution. The sequential interruption model is defined between TEs that occur in a genomic sequence to estimate how often TEs interrupt other TEs, which has been shown to be useful in predicting their ages and their activity throughout evolution. Here, this prediction from the sequential interruptions is shown to be closely related to a classic matrix optimization problem: the Linear Ordering Problem (LOP). By applying a well-studied method of solving the LOP, Tabu search, to the sequential interruption model, a relative age order of all TEs in the human genome is predicted from a single genome. A comparison of the TE ordering between Tabu search and the method used in [47] shows that Tabu search solves the TE problem exceedingly more efficiently, while it still achieves a more accurate result. As a result of the improved efficiency, a prediction on all human TEs is constructed, whereas it was previously only predicted for a minority fraction of the set of the human TEs. When many insertions occurred throughout the evolution of a genomic sequence, the interruptions nest in a recursive pattern. The nested TEs are very helpful in revealing the age of the TEs, but cannot be fully represented by the sequential interruption model. In the recursive interruption model, a specific context- free grammar is defined, describing a general and simple way to capture the recursive nature in which TEs nest themselves into other TEs. Then, each production of the context-free grammar is associated with a probability to convert the context-free grammar into a stochastic context-free grammar that maximizes the applications of the productions corresponding to TE interruptions. A modified version of an algorithm to parse context-free grammars, the CYK algorithm, that takes into account these probabilities is then used to find the most likely parse tree(s) predicting the TE nesting in an efficient fashion. The recursive interruption model produces small parse trees representing local TE interruptions in a genome. These parse trees are a natural way of grouping TE fragments in a genomic sequence together to form interruptions. Next, some tree adjustment operations are given to simplify these parse trees and obtain more standard evolutionary trees. Then an overall TE-interaction network is created by merging these standard evolutionary trees into a weighted directed graph. This TE-interaction network is a rich representation of the predicted interactions between all TEs throughout evolution and is a powerful tool to predict the insertion evolution of these TEs. It is applied to the human genome, but can be easily applied to other genomes. Furthermore, it can also be applied to multiple related genomes where common TEs exist in order to study the interactions between TEs and the genomes. Lastly, a simulation of TE transpositions throughout evolution is developed. This is especially helpful in understanding the dynamics of how TEs evolve and impact their host genomes. Also, it is used as a verification technique for the previous theoretical models in the thesis. By feeding the simulated TE remnants and activity data into the theoretical models, a relative age order is predicted using the sequential interruption model, and a quantified correlation between this predicted order and the input age order in the simulation can be calculated. Then, a TE-interaction network is constructed using the recursive interruption model on the simulated data, which can also be converted into a linear age order by feeding the adjacency matrix of the network to Tabu search. Another correlation is calculated between the predicted age order from the recursive interruption model and the input age order. An average correlation of ten simulations is calculated for each model, which suggests that in general, the recursive interruption model performs better than the sequential interruption model in predicting a correct relative age order of TEs. Indeed, the recursive interruption model achieves an average correlation value of ρ = 0.939 with the correct simulated answer

    Evolving better RNAfold structure prediction

    Get PDF
    Grow and graft genetic programming (GGGP) evolves more than 50000 parameters in a state-of-the-art C program to make functional source code changes which give more accurate predictions of how RNA molecules fold up. Genetic improvement updates 29% of the dynamic programming free energy model parameters. In most cases (50.3%) GI gives better results on 4655 known secondary structures from RNA_STRAND (29.0% are worse and 20.7% are unchanged). Indeed it also does better than parameters recommended by Andronescu, M., et al.: Bioinformatics 23(13) (2007) i19–i28

    Functional nucleic acids as substrate for information processing

    No full text
    Information processing applications driven by self-assembly and conformation dynamics of nucleic acids are possible. These underlying paradigms (self-assembly and conformation dynamics) are essential for natural information processors as illustrated by proteins. A key advantage in utilising nucleic acids as information processors is the availability of computational tools to support the design process. This provides us with a platform to develop an integrated environment in which an orchestration of molecular building blocks can be realised. Strict arbitrary control over the design of these computational nucleic acids is not feasible. The microphysical behaviour of these molecular materials must be taken into consideration during the design phase. This thesis investigated, to what extent the construction of molecular building blocks for a particular purpose is possible with the support of a software environment. In this work we developed a computational protocol that functions on a multi-molecular level, which enable us to directly incorporate the dynamic characteristics of nucleic acids molecules. To allow the implementation of this computational protocol, we developed a designer that able to solve the nucleic acids inverse prediction problem, not only in the multi-stable states level, but also include the interactions among molecules that occur in each meta-stable state. The realisation of our computational protocol are evaluated by generating computational nucleic acids units that resembles synthetic RNA devices that have been successfully implemented in the laboratory. Furthermore, we demonstrated the feasibility of the protocol to design various types of computational units. The accuracy and diversity of the generated candidates are significantly better than the best candidates produced by conventional designers. With the computational protocol, the design of nucleic acid information processor using a network of interconnecting nucleic acids is now feasible

    Computational Methods For Analyzing Rna Folding Landscapes And Its Applications

    Get PDF
    Non-protein-coding RNAs play critical regulatory roles in cellular life. Many ncRNAs fold into specific structures in order to perform their biological functions. Some of the RNAs, such as riboswitches, can even fold into alternative structural conformations in order to participate in different biological processes. In addition, these RNAs can transit dynamically between different functional structures along folding pathways on their energy landscapes. These alternative functional structures are usually energetically favored and are stable in their local energy landscapes. Moreover, conformational transitions between any pair of alternate structures usually involve high energy barriers, such that RNAs can become kinetically trapped by these stable and local optimal structures. We have proposed a suite of computational approaches for analyzing and discovering regulatory RNAs through studying folding pathways, alternative structures and energy landscapes associated with conformational transitions of regulatory RNAs. First, we developed an approach, RNAEAPath, which can predict low-barrier folding pathways between two conformational structures of a single RNA molecule. Using RNAEAPath, we can analyze folding iii pathways between two functional RNA structures, and therefore study the mechanism behind RNA functional transitions from a thermodynamic perspective. Second, we introduced an approach, RNASLOpt, for finding all the stable and local optimal structures on the energy landscape of a single RNA molecule. We can use the generated stable and local optimal structures to represent the RNA energy landscape in a compact manner. In addition, we applied RNASLOpt to several known riboswitches and predicted their alternate functional structures accurately. Third, we integrated a comparative approach with RNASLOpt, and developed RNAConSLOpt, which can find all the consensus stable and local optimal structures that are conserved among a set of homologous regulatory RNAs. We can use RNAConSLOpt to predict alternate functional structures for regulatory RNA families. Finally, we have proposed a pipeline making use of RNAConSLOpt to computationally discover novel riboswitches in bacterial genomes. An application of the proposed pipeline to a set of bacteria in Bacillus genus results in the re-discovery of many known riboswitches, and the detection of several novel putative riboswitch elements

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases
    corecore