9 research outputs found

    Prediction of RNA pseudoknots by Monte Carlo simulations

    Full text link
    In this paper we consider the problem of RNA folding with pseudoknots. We use a graphical representation in which the secondary structures are described by planar diagrams. Pseudoknots are identified as non-planar diagrams. We analyze the non-planar topologies of RNA structures and propose a classification of RNA pseudoknots according to the minimal genus of the surface on which the RNA structure can be embedded. This classification provides a simple and natural way to tackle the problem of RNA folding prediction in presence of pseudoknots. Based on that approach, we describe a Monte Carlo algorithm for the prediction of pseudoknots in an RNA molecule.Comment: 22 pages, 14 figure

    Investigation of Multi-Objective Optimization criteria for RNA design

    Get PDF
    RNA design is the inverse of RNA folding and it appears to be NP-hard. In RNA design, a secondary structure is given and the goal is to find a nucleotide sequence that will fold into this structure. To find such sequence(s) involves exploring the exponentially large sequence space. In literature, heuristic algorithms are the standard technique for tackling the RNA design. Heuristic algorithms enable effective and efficient exploration of the high-dimensional sequence-structure space when searching for candidates that fold into a given target structure. The main goal of this paper is to investigate the use of multi-objective criteria in SIMARD and Quality Pre-selection Strategy (QPS). The objectives that we optimize are Hamming distance (between designed structure and target structure) and thermodynamic free energy. We examine the different combinations of optimization criteria, and attempt to draw conclusions about the relationships between them. We find that energy is a poor primary objective but makes an excellent secondary objective. We also find that using multi-objective pre-selection produces viable solutions in far fewer steps than was previously possible with SIMARD. © 2016 IEEE

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases

    SIMARD: A simulated annealing based RNA design algorithm with quality pre-selection strategies

    Get PDF
    Most of the biological processes including expression levels of genes and translation of DNA to produce proteins within cells depend on RNA sequences, and the structure of the RNA plays vital role for its function. RNA design problem refers to the design of an RNA sequence that folds into given secondary structure. However, vast number of possible nucleotide combinations make this an NP-Hard problem. To solve the RNA design problem, a number of researchers have tried to implement algorithms using local stochastic search, context-free grammars, global sampling or evolutionary programming approaches. In this paper, we examine SIMARD, an RNA design algorithm that implements simulated annealing techniques. We also propose QPS, a mutation operator for SIMARD that pre-selects high quality sequences. Furthermore, we present experiment results of SIMARD compared to eight other RNA design algorithms using the Rfam datset. The experiment results indicate that SIMARD shows promising results in terms of Hamming distance between designed sequence and the target structure, and outperforms ERD in terms of free energy. © 2016 IEEE

    Examining the annealing schedules for RNA design algorithm

    Get PDF
    RNA structures are important for many biological processes in the cell. One important function of RNA are as catalytic elements. Ribozymes are RNA sequences that fold to form active structures that catalyze important chemical reactions. The folded structure for these RNA are very important; only specific conformations maintain these active structures, so it is very important for RNA to fold in a specific way. The RNA design problem describes the prediction of an RNA sequence that will fold into a given RNA structure. Solving this problem allows researchers to design RNA; they can decide on what folded secondary structure is required to accomplish a task, and the algorithm will give them a primary sequence to assemble. However, there are far too many possible primary sequence combinations to test sequentially to see if they would fold into the structure. Therefore we must employ heuristics algorithms to attempt to solve this problem. This paper introduces SIMARD, an evolutionary algorithm that uses an optimization technique called simulated annealing to solve the RNA design problem. We analyzes three different cooling schedules for the annealing process: 1) An adaptive cooling schedule, 2) a geometric cooling schedule, and 3) a geometric cooling schedule with warm up. Our results show that an adaptive annealing schedule may not be more effective at minimizing the Hamming distance between the target structure and our folded sequence's structure when compared with geometric schedules. The results also show that warming up in a geometric cooling schedule may be useful for optimizing SIMARD. © 2016 IEEE

    Analysis, Design, and Construction of Nucleic Acid Devices

    Get PDF
    Nucleic acids present great promise as building blocks for nanoscale devices. To achieve this potential, methods for the analysis and design of DNA and RNA need to be improved. In this thesis, traditional algorithms for analyzing nucleic acids at equilibrium are extended to handle a class of pseudoknots, with examples provided relevant to biologists and bioengineers. With these analytical tools in hand, nucleic acid sequences are designed to maximize the equilibrium probability of a desired fold. Upon analysis, it is concluded that both affinity and specificity are important when choosing a sequence; this conclusion holds for a wide range of target structures and is robust to random perturbations to the energy model. Applying the intuition gained from these studies, a process called hybridization chain reaction (HCR) is invented, and sequences are chosen that experimentally verify this phenomenon. In HCR, a small number of DNA or RNA molecules trigger a system wide configurational change, allowing the amplification and detection of specific, nucleic acid sequences. As an extension, HCR is combined with a pre-existing aptamer domain to successfully construct an ATP sensor, and the groundwork is laid for the future development of sensors for other small molecules. In addition, recent studies on multi-stranded algorithms and improvements to HCR are included in the appendices. Not only will these advancements increase our understanding of biological RNAs, but they will also provide valuable tools for the future development of nucleic acid nanotechnologies

    The identification of biologically important secondary structures in disease-causing RNA viruses

    Get PDF
    Masters of ScienceViral genomes consist of either deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The viral RNA molecules are responsible for two functions, firstly, their sequences contain the genetic code, which encodes the viral proteins, and secondly, they may form structural elements important in the regulation of the viral life-cycle. Using a host of computational and bioinformatics techniques we investigated how predicted secondary structure may influence the evolutionary dynamics of a group of single-stranded RNA viruses from the Picornaviridae family. We detected significant and marginally significant correlations between regions predicted to be structured and synonymous substitution constraints in these regions, suggesting that selection may be acting on those sites to maintain the integrity of certain structures. Additionally, coevolution analysis showed that nucleotides predicted to be base paired, tended to co-evolve with one another in a complimentary fashion in four out of the eleven species examined. Our analyses were then focused on individual structural elements within the genome-wide predicted structures. We ranked the predicted secondary structural elements according to their degree of evolutionary conservation, their associated synonymous substitution rates and the degree to which nucleotides predicted to be base paired coevolved with one another. Top ranking structures coincided with well characterized secondary structures that have been previously described in the literature. We also assessed the impact that genomic secondary structures had on the recombinational dynamics of picornavirus genomes, observing a strong tendency for recombination breakpoints to occur in non-coding regions. However, convincing evidence for the association between the distribution of predicted RNA structural elements and breakpoint clustering was not detected

    Identification and ranking of pervasive secondary structures in positive sense single-stranded ribonucleic acid viral genomes

    Get PDF
    Philosophiae Doctor - PhDThe plasticity of single-stranded viral genomes permits the formation of secondary structures through complementary base-pairing of their component nucleotides. Such structures have been shown to regulate a number of biological processes during the viral life-cycle including, replication, translation, transcription, post-transcriptional editing and genome packaging. However, even randomly generated single-stranded nucleotide sequences have the capacity to form stable secondary structures and therefore, amongst the numerous secondary structures formed in large viral genomes only a few of these elements will likely be biologically relevant. While it is possible to identify functional elements through series of laboratory experiments, this is both excessively resource- and time-intensive, and therefore not always feasible. A more efficient approach involves the use of computational comparative analyses methods to study the signals of molecular evolution that are consistent with selection acting to preserve particular structural elements. In this study, I systematically deploy a collection of computationally-based molecular evolution detection methods to analyse the genomes of viruses belonging to a number of ssRNA viral families (Alphaflexiviridae, Arteriviridae, Caliciviridae, Closteroviridae, Coronavirinae, Flaviviridae, Luteoviridae, Picornaviridae, Potyviridae, Togaviridae and Virgaviridae), for evidence of selectively stabilised secondary structures. To identify potentially important structural elements the approach incorporates structure prediction data with signals of natural selection, sequence co-evolution and genetic recombination. In addition, auxiliary computational tools were used to; 1) quantitatively rank the identified structures in order of their likely biological importance, 2) plot co-ordinates of structures onto viral genome maps, and 3) visualise individual structures, overlaid with estimates from the molecular evolution analyses. I show that in many of these viruses purifying selection tends to be stronger at sites that are predicted to be base-paired within secondary structures, in addition to strong associations between base-paired sites and those that are complementarily co-evolving. Lastly, I show that in recombinant genomes breakpoint locations are weakly associated with co-ordinates of secondary structures. Collectively, these findings suggest that natural selection acting to maintain potentially functional secondary structures has been a major theme during the evolution of these ssRNA viruses

    RNA inverse folding and synthetic design

    Get PDF
    Thesis advisor: Welkin E. JohnsonThesis advisor: Peter G. CloteSynthetic biology currently is a rapidly emerging discipline, where innovative and interdisciplinary work has led to promising results. Synthetic design of RNA requires novel methods to study and analyze known functional molecules, as well as to generate design candidates that have a high likelihood of being functional. This thesis is primarily focused on the development of novel algorithms for the design of synthetic RNAs. Previous strategies, such as RNAinverse, NUPACK-DESIGN, etc. use heuristic methods, such as adaptive walk, ensemble defect optimization (a form of simulated annealing), genetic algorithms, etc. to generate sequences that minimize specific measures (probability of the target structure, ensemble defect). In contrast, our approach is to generate a large number of sequences whose minimum free energy structure is identical to the target design structure, and subsequently filter with respect to different criteria in order to select the most promising candidates for biochemical validation. In addition, our software must be made accessible and user-friendly, thus allowing researchers from different backgrounds to use our software in their work. Therefore, the work presented in this thesis concerns three areas: Create a potent, versatile and user friendly RNA inverse folding algorithm suitable for the specific requirements of each project, implement tools to analyze the properties that differentiate known functional RNA structures, and use these methods for synthetic design of de-novo functional RNA molecules.Thesis (PhD) — Boston College, 2016.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology
    corecore