5 research outputs found

    Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures

    Get PDF
    BACKGROUND: Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains. RESULT: Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naïve implementation would yield recursions with a very high time complexity of O(n(6)D(5)) for sequence length n and D distinct distance values, it is possible to reduce this to O(n(4)) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules. CONCLUSIONS: The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html

    From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

    Get PDF
    Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

    Disease Associated Mutations and Functional Variants that Significantly Disrupt RNA Structure

    Get PDF
    Genome-Wide Association Studies (GWAS) have revealed a great deal of trait and diseaseassociated Single Nucleotide Polymorphisms (SNPs) that fall in noncoding or intergenic regions of the human genome. This is congruent with the current understanding that many of these regions are actively transcribed, and that many transcripts and transcript regions that do not code for protein have important roles in the cell. In carrying out many transcripts’ functions, RNA structure plays a critical role. We hypothesized that a subset of noncoding disease associated SNPs significantly change RNA structure. We developed a program called SNPfold to identify SNPs that cause significant RNA structural rearrangement and utilized it on a set of 514 disease-associated SNPs in 350 unique noncoding regions of the human transcriptome. We identified six disease-states (Hyperferritinemia Cataract Syndrome, β- Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease, and Hypertension) where multiple SNPs significantly alter RNA structural ensembles. We then conducted Selective 2’ OH Acylation and Primer Extension (SHAPE) in order to confirm predicted structure change caused by SNPs associated with Hyperferritinemia Catraract Syndrome (U22G and A56U in the FTL 5’ UTR). Both mutations are shown to disrupt the formation of an Iron Response Element stemloop that is critical to translational regulation of the mRNA. We identified compensatory mutations that were able to restore these mutant structures to that of wildtype FTL 5’ UTR. We then identified from human haplotype data several regions where SNP pairs inherited together conserve structure. Lastly, we explored the functional effect of common SNPs associated with change in RNA expression level by calculating the enrichment of their overlap with experimentally derived binding sites for 14 different RNA-binding proteins. Consistent with a subset of these SNPs altering structure in functionally important sites of mRNA transcripts, we identified several proteins where SNPs are enriched for proximal overlap. These results in their entirety indicate that both rare disease-associated and common SNPs that significantly change RNA structure are present in human populations, and that such a functional effect may account for a subset of phenotypic differences and complex disease propensities among individuals.Doctor of Philosoph
    corecore