207 research outputs found

    A thermodynamic approach to designing structure-free combinatorial DNA word sets

    Get PDF
    An algorithm is presented for the generation of sets of non-interacting DNA sequences, employing existing thermodynamic models for the prediction of duplex stabilities and secondary structures. A DNA ‘word’ structure is employed in which individual DNA ‘words’ of a given length (e.g. 12mer and 16mer) may be concatenated into longer sequences (e.g. four tandem words and six tandem words). This approach, where multiple word variants are used at each tandem word position, allows very large sets of non-interacting DNA strands to be assembled from combinations of the individual words. Word sets were generated and their figures of merit are compared to sets as described previously in the literature (e.g. 4, 8, 12, 15 and 16mer). The predicted hybridization behavior was experimentally verified on selected members of the sets using standard UV hyperchromism measurements of duplex melting temperatures (T(m)s). Additional experimental validation was obtained by using the sequences in formulating and solving a small example of a DNA computing problem

    Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

    Get PDF
    BACKGROUND: The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n(6))time and O(n(4)) space algorithm by Rivas and Eddy is currently the best available program. RESULTS: We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n(4)) time and O(n(2)) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm

    Automated Design of Dynamic Programming Schemes for RNA Folding with Pseudoknots

    Get PDF
    Despite being a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, RNA secondary structure prediction remains challenging whenever pseudoknots come into play. To circumvent the NP-hardness of energy minimization in realistic energy models, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. While these methods rely on hand-crafted DP schemes, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. We formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the tree-width tw of the fatgraph, and its output represents a ?(n^{tw+1}) algorithm for predicting the MFE folding of an RNA of length n. Our general framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case

    Prediction of secondary structures for large RNA molecules

    Get PDF
    The prediction of correct secondary structures of large RNAs is one of the unsolved challenges of computational molecular biology. Among the major obstacles is the fact that accurate calculations scale as O(n⁴), so the computational requirements become prohibitive as the length increases. We present a new parallel multicore and scalable program called GTfold, which is one to two orders of magnitude faster than the de facto standard programs mfold and RNAfold for folding large RNA viral sequences and achieves comparable accuracy of prediction. We analyze the algorithm's concurrency and describe the parallelism for a shared memory environment such as a symmetric multiprocessor or multicore chip. We are seeing a paradigm shift to multicore chips and parallelism must be explicitly addressed to continue gaining performance with each new generation of systems. We provide a rigorous proof of correctness of an optimized algorithm for internal loop calculations called internal loop speedup algorithm (ILSA), which reduces the time complexity of internal loop computations from O(n⁴) to O(n³) and show that the exact algorithms such as ILSA are executed with our method in affordable amount of time. The proof gives insight into solving these kinds of combinatorial problems. We have documented detailed pseudocode of the algorithm for predicting minimum free energy secondary structures which provides a base to implement future algorithmic improvements and improved thermodynamic model in GTfold. GTfold is written in C/C++ and freely available as open source from our website.M.S.Committee Chair: Bader, David; Committee Co-Chair: Heitsch, Christine; Committee Member: Harvey, Stephen; Committee Member: Vuduc, Richar

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases

    RNA folding kinetics including pseudoknots

    Get PDF
    RNA Moleküle sind ein essenzieller Bestandteil biologischer Zellen. Ihre Vielfalt an Funktionen ist eng verknüpft mit der jeweiligen Sequenz und der daraus gebildeten Struktur. Der Großteil bekannter RNA Moleküle faltet in eine bestimmte energetisch stabile Struktur, bzw. ̈hnliche suboptimale Strukturen mit der gleichen biologischen Funktion. Riboswitches hingegen, eine bestimmte Gruppe von RNA Molekülen können zwischen zwei strukturell sehr verschiedenen Konformationen wechseln, wobei eine funktional ist und die andere nicht. Die Umfaltung solcher RNA-Schalter wird normalerweise durch verschiedenste Metaboliten ausgelöst die mit der RNA interagieren. Zellen nutzen dieses Prinzip um auf Signale aus der Umwelt effizient reagieren zu können. Im Zuge der synthetischen Biologie wurde eine neue Art von RNA-Schaltern entwickelt, die statt einem bestimmten Metaboliten ein anderes RNA Molekül erkennt [1]. Dieses Prinzip ziehlt weniger darauf ab Signale aus der Umgebung wahrzunehmen, sondern ein weiteres Level an Genregulation zu ermöglichen. In dieser Abeit wird das Program RNAscout.pl präsentiert, welches die Umfaltung zwischen verschiedenen RNA Strukturen berechnet und damit die Effizienz RNA-induzierter RNA-Schalter bewerten kann. Der zugrundeliegenede Algorithmus berechnet ein Set an Zwischenzuständen die sowohl energetisch günstig, als auch strukturell ähnlich zu den beiden stabilen Riboswitch-Konformationen sind. Basierend auf diesem Umfaltungsnetzwerk werden kinetische Simulationen gezeigt, bei denen der Umfaltungsweg des RNA-Schalters vorhergesagt wird. Des Weiteren wird das Programm pk findpath vorgestellt. Der zugrundeliegende Algorithmus berechnet den besten direkten Umfaltungspfad zwischen zwei RNA Strukturen mittels einer Breitensuche. Beide Programme, RNAscout.pl und pk findpath, werden verwendet um abzuschätzen ob natürliche RNA Moleküle optimiert sind um in ihre energetisch günstigste Konformation zu falten. Im Zuge dessen werden die Programme mit existierenden Programmen des Vienna RNA package [2] verglichen.RNA molecules are essential components of living cells. Their wide range of different functions depends on the sequence of nucleotides and the corresponding structure. The majority of known RNA molecules fold into their energetically most stable conformation, as well as structurally similar suboptimal conformations that do not alter the specific task of the molecule. However, there are RNA molecules which can switch between two structurally distant conformations one of which is functional, the other is not. The best known examples are riboswitches, which usually sense various kinds of metabolites from their environment that trigger the refolding from one conformation into the other. The rather new field of synthetic biology led to the construction of an example for a new type of riboswitches, which refold upon interaction with other RNA molecules [1]. Such RNA-triggered riboswitches are not aimed at sensing the environment, but expand the repertoire of gene-regulation. Inspired by this example, we present RNAscout.pl, a new program to study refolding between two RNA conformations, which can be used to estimate the performance of RNA-triggered riboswitches. The underlying algorithm heuristically computes a set of intermediate conformations that are energetically favorable and structurally related to both stable conformations of the riboswitch. Based on this refolding network, we show kinetic simulations that support the expected refolding path for our riboswitch example. Moreover, we present pk findpath, a breadth-first search algorithm to estimate direct paths (i. e. a small subset of all possible paths) between two different RNA conformations. Both programs RNAscout.pl and pk findpath will be used to estimate whether natural RNA molecules are optimized to fold into their energetically most stable conformation. Thereby, we compare the new programs against existing programs of the Vienna RNA package [2

    ViennaRNA Package 2.0

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties.</p> <p>Results</p> <p>The <monospace>ViennaRNA</monospace> Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the <it>Turner 2004 </it>parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying <monospace>RNAlib</monospace> and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as <it>centroid </it>structures and <it>maximum expected accuracy </it>structures derived from base pairing probabilities, or <it>z</it>-<it>scores </it>for locally stable secondary structures, and support for input in <monospace>fasta</monospace> format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions.</p> <p>Conclusions</p> <p>The <monospace>ViennaRNA Package 2.0</monospace>, supporting concurrent computations <monospace>via OpenMP</monospace>, can be downloaded from <url>http://www.tbi.univie.ac.at/RNA</url>.</p

    Enhanced Algorithms for Analysis and Design of Nucleic Acid Reaction Pathways

    Get PDF
    Nucleic acids provide a powerful platform for programming at the molecular level. This is possible because the free energy of nucleic acid structures is dominated by the local interactions of base pairing and base pair stacking. The nearest neighbor secondary structure model implied by these energetics has enabled development of a set of algorithms for calculating thermodynamic quantities of nucleic acid sequences. Molecular programmers and synthetic biologists continue to extend their reach to larger, more complicated nucleic acid complexes, reaction pathways, and systems. This necessitates a focus on new algorithm development and efficient implementations to enable analysis and design of such systems. Concerning analysis of nucleic acids, we collect seemingly diverse algorithms under a unified three-component dynamic programming framework consisting of: 1) recursions that specify the dependencies between subproblems and incorporate the details of the structural ensemble and the free energy model, 2) evaluation algebras that define the mathematical form of each subproblem, 3) operation orders that specify the computational trajectory through the dependency graph of subproblems. Changes to the set of recursions allows operation over the complex ensemble including coaxial and dangle stacking states, affecting all thermodynamic quantities. An updated operation order for structure sampling allows simultaneous generation of a set of structures sampled from the Boltzmann distribution in time that scales empirically sublinearly in the number of samples and leads to an order of magnitude or more speedup over repeated single-structure sampling. For the problem of sequence design for reaction pathway engineering, we introduce an optimization algorithm to minimize the multitstate test tube ensemble defect, which simultaneously designs for reactant, intermediate, and product states along the reaction pathway (positive design) and against crosstalk interactions (negative design). Each of these on-pathway or crosstalk states is represented as a target test tube ensemble containing arbitrary numbers of on-target complexes, each with a target secondary structure and target concentration, and arbitrary numbers of off-target complexes, each with vanishing target concentration. Our test tube specification formalism enables conversion of a reaction pathway specification into a set of target test tubes. Sequences are designed subject to a set of hard constraints allowing specification of properties such as sequence composition, sequence complementarity, prevention of unwanted sequence patterns, and inclusion of biological sequences. We then extend this algorithm with soft constraints, enhancing flexibility through new constraint types and reducing design cost by up to two orders of magnitude in the most highly constrained cases. These soft constraints enable multiobjective design of the multitstate test tube ensemble defect simultaneously with heuristics for avoiding kinetic traps and equalizing reaction rates to further aid reaction pathway engineering.</p

    Computation and programmability at the nano-bio interface

    Get PDF
    PhD ThesisThe manipulation of physical reality on the molecular level and construction of devices operating on the nanoscale has been the focal point of nanotechnology. In particular, nanotechnology based on DNA and RNA has a potential to nd applications in the eld of Synthetic Biology thanks to the inherent compatibility of nucleic acids with biological systems. Sca olded DNA origami, proposed by P. Rothemund, is one of the leading and most successful methods in which nanostructures are realised through rational programming of short 'staple' oligomers which fold a long single-stranded DNA called the 'sca old' strand into a variety of desired shapes. DNA origami already has many applications; including intelligent drug delivery, miniaturisation of logic circuits and computation in vivo. However, one of the factors that are limiting the complexity, applicability and scalability of this approach is the source of the sca old which commonly originates from viruses or phages. Furthermore, developing a robust and orthogonal interface between DNA nanotechnology and biological parts remains a signi cant challenge. The rst part of this thesis tackles these issues by challenging the fundamental as- sumption in the eld, namely that a viral sequence is to be used as the DNA origami sca old. A method is introduced for de novo generation of long synthetic sequences based on De Bruijn sequence, which has been previously proposed in combinatorics. The thesis presents a collection of algorithms which allow the construction of custom- made sequences that are uniquely addressable and biologically orthogonal (i.e. they do not code for any known biological function). Synthetic sca olds generated by these algorithms are computationally analysed and compared with their natural counter- parts with respect to: repetition in sequence, secondary structure and thermodynamic addressability. This also aids the design of wet lab experiments pursuing justi cation and veri cation of this novel approach by empirical evidence. The second part of this thesis discusses the possibility of applying evolutionary op- timisation to synthetic DNA sequences under constraints dictated by the biological interface. A multi-strand system is introduced based on an alternative approach to DNA self-assembly, which relies on strand-displacement cascades, for molecular data storage. The thesis demonstrates how a genetic algorithm can be used to generate viable solutions to this sequence optimisation problem which favours the target self- assembly con guration. Additionally, the kinetics of strand-displacement reactions are analysed with existing coarse-grained DNA models (oxDNA). This thesis is motivated by the application of scienti c computing to problems which lie on the boundary of Computer Science and the elds of DNA Nanotechnology, DNA Computing and Synthetic Biology, and thus I endeavour to the best of my ability to establish this work within the context of these disciplines
    corecore