39 research outputs found

    Prediction of RNA pseudoknots by Monte Carlo simulations

    Full text link
    In this paper we consider the problem of RNA folding with pseudoknots. We use a graphical representation in which the secondary structures are described by planar diagrams. Pseudoknots are identified as non-planar diagrams. We analyze the non-planar topologies of RNA structures and propose a classification of RNA pseudoknots according to the minimal genus of the surface on which the RNA structure can be embedded. This classification provides a simple and natural way to tackle the problem of RNA folding prediction in presence of pseudoknots. Based on that approach, we describe a Monte Carlo algorithm for the prediction of pseudoknots in an RNA molecule.Comment: 22 pages, 14 figure

    Unfolding RNA 3D structures for secondary structure prediction benchmarking

    Full text link
    Les acides ribonucléiques (ARN) forment des structures tri-dimensionnelles complexes stabilisées par la formation de la structure secondaire (2D), elle-même formée de paires de bases. Plusieurs méthodes computationnelles ont été créées dans les dernières années afin de prédire la structure 2D d’ARNs, en partant de la séquence. Afin de simplifier le calcul, ces méthodes appliquent généralement des restrictions sur le type de paire de bases et la topologie des structures 2D prédites. Ces restrictions font en sorte qu’il est parfois difficile de savoir à quel point la totalité des paires de bases peut être représentée par ces structures 2D restreintes. MC-Unfold fut créé afin de trouver les structures 2D restreintes qui pourraient être associées à une structure secondaire complète, en fonction des restrictions communément utilisées par les méthodes de prédiction de structure secondaire. Un ensemble de 321 monomères d’ARN totalisant plus de 4223 structures fut assemblé afin d’évaluer les méthodes de prédiction de structure 2D. La majorité de ces structures ont été déterminées par résonance magnétique nucléaire et crystallographie aux rayons X. Ces structures ont été dépliés par MC-Unfold et les structures résultantes ont été comparées à celles prédites par les méthodes de prédiction. La performance de MC-Unfold sur un ensemble de structures expérimentales est encourageante. En moins de 5 minutes, 96% des 227 structures ont été complètement dépliées, le reste des structures étant trop complexes pour être déplié rapidement. Pour ce qui est des méthodes de prédiction de structure 2D, les résultats indiquent qu’elles sont capable de prédire avec un certain succès les structures expérimentales, particulièrement les petites molécules. Toutefois, si on considère les structures larges ou contenant des pseudo-noeuds, les résultats sont généralement défavorables. Les résultats obtenus indiquent que les méthodes de prédiction de structure 2D devraient être utilisées avec prudence, particulièrement pour de larges molécules.Ribonucleic acids (RNA) adopt complex three dimensional structures which are stabilized by the formation of base pairs, also known as the secondary (2D) structure. Predicting where and how many of these interactions occur has been the focus of many computational methods called 2D structure prediction algorithms. These methods disregard some interactions, which makes it difficult to know how well a 2D structure represents an RNA structure, especially when large amounts of base pairs are ignored. MC-Unfold was created to remove interactions violating the assumptions used by prediction methods. This process, named unfolding, extends previous planarization and pseudoknot removal methods. To evaluate how well computational methods can predict experimental structures, a set of 321 RNA monomers corresponding to more than 4223 experimental structures was acquired. These structures were mostly determined using nuclear magnetic resonance and X-ray crystallography. MC-Unfold was used to remove interactions the prediction algorithms were not expected to predict. These structures were then compared with the structured predicted. MC-Unfold performed very well on the test set it was given. In less than five minutes, 96% of the 227 structure could be exhaustively unfolded. The few remaining structures are very large and could not be unfolded in reasonable time. MC-Unfold is therefore a practical alternative to the current methods. As for the evaluation of prediction methods, MC-Unfold demonstrated that the computational methods do find experimental structures, especially for small molecules. However, when considering large or pseudoknotted molecules, the results are not so encouraging. As a consequence, 2D structure prediction methods should be used with caution, especially for large structures

    From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

    Get PDF
    Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

    RNA secondary structure prediction including pseudoknots

    Get PDF
    RNAs sind sehr wichtige Biomoleküle. Früher sah man in ihnen nur die Zwischenstufe zwischen DNA, dem Träger der genetischen Information, und Proteinen, den Katalysatoren biochemischer Reaktionen. Heute wissen wir von der Existenz verschiedenster Klassen von RNAs, die selbst katalytische Eigenschaften haben. Die Funktion eines RNA-Moleküls ist von seiner dreidimensionalen Struktur (der Tertiärstruktur) abhängig, die wiederum von den Basenpaarung innerhalb des RNA-Moleküls (der Sekundärstruktur) abhängig ist. Um von der linearen Sequenz (der Primärstruktur) auf die Funktion eines RNA-Moleküls schließen zu können, sollte man im Idealfall in der Lage sein, allein von der Sequenz die komplette dreidimensionale Struktur vorhersagen zu können. Weil aber RNA-Faltung als hierarchischer Prozess betrachtet werden kann, wobei sich die Sekundärstruktur vor jeglichen tertiären Interaktionen ausbildet, kann schon die Sekundärstruktur als Ausgangspunkt für die funktionelle Analyse dienen. Dementsprechend ist RNA-Sekundärstrukturvorhersage ein zentrales Problem der Bioinformatik. Der Großteil aller RNA-Basenpaare ist perfekt verschachtelt, was bedeutet, daß alle Nukleotide, die von einem Basenpaar umschlossen sind, nicht mit Nukleotiden außerhalb dieses Basenpaars interagieren. Diese Eigenschaft erlaubt es, die gesamte RNA Sekundärstruktur in einfachere und voneinander unabhängige Substrukturen, die sogenannten Loops, für deren freie Energien man Parameter kennt, zu zerlegen. Dynamic Programming, der am häufigsten verwendete Ansatz zur RNA-Sekundärstrukturvorhersage, ist auf diese Loop-Zerlegung angewiesen. Pseudoknoten, von denen man in letzter Zeit immer mehr entdeckt hat, sind RNA-Strukturen, die diesen vereinfachenden Schritt nicht zulassen. Bei einem Pseudoknoten formen Nukleotide innerhalb eines Loops Basenpaare mit Nukleotiden außerhalb des Loops und verletzen damit die Bedingung der perfekt verschachtelten Sekundärstrukturen. Deshalb ist die Berücksichtigung von Pseudoknoten rechnerisch komplizierter und aufwändiger und herkömmliche Algorithmen zur RNA-Sekundärstrukturvorhersage schließen Pseudoknoten der Einfachkeit halber aus. Erst in den letzten Jahren wurden Ansätze zur Vorhersage von Pseudoknoten entwickelt, die entweder auf Dynamic Programming oder auf heuristischen Methoden beruhen. In dieser Diplomarbeit präsentiere ich PKplex, einen neuen, Dynamic Programming-basierten Algorithmus zur Vorhersage von RNA Sekundärstrukturen mit Pseudoknoten. Zuerst wird die grundlegende Idee hinter PKplex und ihre Umsetzung beschrieben, und dann wird der Algorithmus auf einen großen Datensatz bekannter RNA Pseudoknoten angewandt und seine Ergebnisse mit denen anderer publizierter Algorithmen verglichen.RNAs are very important biological molecules. Previously they were thought of as being only the intermediary between DNA, which carries the genetic information, and proteins, which catalyze biochemical reactions. Today we know about the existence of diverse classes of RNAs which exhibit catalytic functions themselves. The function of an RNA molecule is dependent on its three-dimensional structure (the tertiary structure), which is in turn dependent on the base pairing within the RNA molecule (the secondary structure). In order to draw functional conclusions from the linear sequence of an RNA molecule (the primary structure), one would ideally be able to predict the whole three-dimensional fold based on the sequence alone. But because the folding process of RNA is mainly a hierarchical process, with the secondary structure forming before any tertiary interactions, the secondary structure can already be used as a starting point for functional analysis. Therefore prediction of the secondary structure of RNAs is a central problem in bioinformatics. The majority of all RNA base pairs are perfectly nested, meaning that all nucleotides enclosed by a specific base pair do not interact with any nucleotides outside of this base pair. This property allows the decomposition of the whole RNA secondary structure into simpler and independent substructures called loops, for which free energy parameters exist. The most common approach to predicting RNA secondary structures is based on dynamic programming, which relies heavily on this loop decomposition. A certain group of RNA secondary structures called pseudoknots, of which more and more have been discovered in recent years, do not allow this simplification. In a pseudoknot nucleotides within a loop form base pairs with nucleotides outside of the loop, violating the condition of perfectly nested secondary structures. Pseudoknots are therefore more difficult and more expensive to handle computationally and the standard RNA secondary structure prediction algorithms simply do not take pseudoknots into account. Approaches for predicting pseudoknots have only been developed in recent years, some of them based on dynamic programming, others on heuristic methods. In this diploma thesis I present PKplex, a new dynamic programming based algorithm for the prediction of RNA secondary structures including pseudoknots. After describing the basic idea behind PKplex and its implementation, the algorithm is then evaluated against a large set of known RNA pseudoknots and its performance compared with other published algorithms

    Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach

    Get PDF
    Reeder J. Algorithms for RNA secondary structure analysis : prediction of pseudoknots and the consensus shapes approach. Bielefeld (Germany): Bielefeld University; 2007.Our understanding of the role of RNA has undergone a major change in the last decade. Once believed to be only a mere carrier of information and structural component of the ribosomal machinery in the advent of the genomic age, it is now clear that RNAs play a much more active role. RNAs can act as regulators and can have catalytic activity - roles previously only attributed to proteins. There is still much speculation in the scientific community as to what extent RNAs are responsible for the complexity in higher organisms which can hardly be explained with only proteins as regulators. In order to investigate the roles of RNA, it is therefore necessary to search for new classes of RNA. For those and already known classes, analyses of their presence in different species of the tree of life will provide further insight about the evolution of biomolecules and especially RNAs. Since RNA function often follows its structure, the need for computer programs for RNA structure prediction is an immanent part of this procedure. The secondary structure of RNA - the level of base pairing - strongly determines the tertiary structure. As the latter is computationally intractable and experimentally expensive to obtain, secondary structure analysis has become an accepted substitute. In this thesis, I present two new algorithms (and a few variations thereof) for the prediction of RNA secondary structures. The first algorithm addresses the problem of predicting a secondary structure from a single sequence including RNA pseudoknots. Pseudoknots have been shown to be functionally relevant in many RNA mediated processes. However, pseudoknots are excluded from considerations by state-of-the-art RNA folding programs for reasons of computational complexity. While folding a sequence of length n into unknotted structures requires O(n^3) time and O(n^2) space, finding the best structure including arbitrary pseudoknots has been proven to be NP-complete. Nevertheless, I demonstrate in this work that certain types of pseudoknots can be included in the folding process with only a moderate increase of computational cost. In analogy to protein coding RNA, where a conserved encoded protein hints at a similar metabolic function, structural conservation in RNA may give clues to RNA function and to finding of RNA genes. However, structure conservation is more complex to deal with computationally than sequence conservation. The method considered to be at least conceptually the ideal approach in this situation is the Sankoff algorithm. It simultaneously aligns two sequences and predicts a common secondary structure. Unfortunately, it is computationally rather expensive - O(n^6) time and O(n^4) space for two sequences, and for more than two sequences it becomes exponential in the number of sequences! Therefore, several heuristic implementations emerged in the last decade trying to make the Sankoff approach practical by introducing pragmatic restrictions on the search space. In this thesis, I propose to redefine the consensus structure prediction problem in a way that does not imply a multiple sequence alignment step. For a family of RNA sequences, my method explicitly and independently enumerates the near-optimal abstract shape space and predicts an abstract shape as the consensus for all sequences. For each sequence, it delivers the thermodynamically best structure which has this shape. The technique of abstract shapes analysis is employed here for a synoptic view of the suboptimal folding space. As the shape space is much smaller than the structure space, and identification of common shapes can be done in linear time (in the number of shapes considered), the method is essentially linear in the number of sequences. Evaluations show that the new method compares favorably with available alternatives

    RNA folding kinetics including pseudoknots

    Get PDF
    RNA Moleküle sind ein essenzieller Bestandteil biologischer Zellen. Ihre Vielfalt an Funktionen ist eng verknüpft mit der jeweiligen Sequenz und der daraus gebildeten Struktur. Der Großteil bekannter RNA Moleküle faltet in eine bestimmte energetisch stabile Struktur, bzw. ̈hnliche suboptimale Strukturen mit der gleichen biologischen Funktion. Riboswitches hingegen, eine bestimmte Gruppe von RNA Molekülen können zwischen zwei strukturell sehr verschiedenen Konformationen wechseln, wobei eine funktional ist und die andere nicht. Die Umfaltung solcher RNA-Schalter wird normalerweise durch verschiedenste Metaboliten ausgelöst die mit der RNA interagieren. Zellen nutzen dieses Prinzip um auf Signale aus der Umwelt effizient reagieren zu können. Im Zuge der synthetischen Biologie wurde eine neue Art von RNA-Schaltern entwickelt, die statt einem bestimmten Metaboliten ein anderes RNA Molekül erkennt [1]. Dieses Prinzip ziehlt weniger darauf ab Signale aus der Umgebung wahrzunehmen, sondern ein weiteres Level an Genregulation zu ermöglichen. In dieser Abeit wird das Program RNAscout.pl präsentiert, welches die Umfaltung zwischen verschiedenen RNA Strukturen berechnet und damit die Effizienz RNA-induzierter RNA-Schalter bewerten kann. Der zugrundeliegenede Algorithmus berechnet ein Set an Zwischenzuständen die sowohl energetisch günstig, als auch strukturell ähnlich zu den beiden stabilen Riboswitch-Konformationen sind. Basierend auf diesem Umfaltungsnetzwerk werden kinetische Simulationen gezeigt, bei denen der Umfaltungsweg des RNA-Schalters vorhergesagt wird. Des Weiteren wird das Programm pk findpath vorgestellt. Der zugrundeliegende Algorithmus berechnet den besten direkten Umfaltungspfad zwischen zwei RNA Strukturen mittels einer Breitensuche. Beide Programme, RNAscout.pl und pk findpath, werden verwendet um abzuschätzen ob natürliche RNA Moleküle optimiert sind um in ihre energetisch günstigste Konformation zu falten. Im Zuge dessen werden die Programme mit existierenden Programmen des Vienna RNA package [2] verglichen.RNA molecules are essential components of living cells. Their wide range of different functions depends on the sequence of nucleotides and the corresponding structure. The majority of known RNA molecules fold into their energetically most stable conformation, as well as structurally similar suboptimal conformations that do not alter the specific task of the molecule. However, there are RNA molecules which can switch between two structurally distant conformations one of which is functional, the other is not. The best known examples are riboswitches, which usually sense various kinds of metabolites from their environment that trigger the refolding from one conformation into the other. The rather new field of synthetic biology led to the construction of an example for a new type of riboswitches, which refold upon interaction with other RNA molecules [1]. Such RNA-triggered riboswitches are not aimed at sensing the environment, but expand the repertoire of gene-regulation. Inspired by this example, we present RNAscout.pl, a new program to study refolding between two RNA conformations, which can be used to estimate the performance of RNA-triggered riboswitches. The underlying algorithm heuristically computes a set of intermediate conformations that are energetically favorable and structurally related to both stable conformations of the riboswitch. Based on this refolding network, we show kinetic simulations that support the expected refolding path for our riboswitch example. Moreover, we present pk findpath, a breadth-first search algorithm to estimate direct paths (i. e. a small subset of all possible paths) between two different RNA conformations. Both programs RNAscout.pl and pk findpath will be used to estimate whether natural RNA molecules are optimized to fold into their energetically most stable conformation. Thereby, we compare the new programs against existing programs of the Vienna RNA package [2
    corecore