104,924 research outputs found

    Expected distance between terminal nucleotides of RNA secondary structures.

    Get PDF
    International audienceIn "The ends of a large RNA molecule are necessarily close", Yoffe et al. (Nucleic Acids Res 39(1):292-299, 2011) used the programs RNAfold [resp. RNAsubopt] from Vienna RNA Package to calculate the distance between 5' and 3' ends of the minimum free energy secondary structure [resp. thermal equilibrium structures] of viral and random RNA sequences. Here, the 5'-3' distance is defined to be the length of the shortest path from 5' node to 3' node in the undirected graph, whose edge set consists of edges {i, i + 1} corresponding to covalent backbone bonds and of edges {i, j} corresponding to canonical base pairs. From repeated simulations and using a heuristic theoretical argument, Yoffe et al. conclude that the 5'-3' distance is less than a fixed constant, independent of RNA sequence length. In this paper, we provide a rigorous, mathematical framework to study the expected distance from 5' to 3' ends of an RNA sequence. We present recurrence relations that precisely define the expected distance from 5' to 3' ends of an RNA sequence, both for the Turner nearest neighbor energy model, as well as for a simple homopolymer model first defined by Stein and Waterman. We implement dynamic programming algorithms to compute (rather than approximate by repeated application of Vienna RNA Package) the expected distance between 5' and 3' ends of a given RNA sequence, with respect to the Turner energy model. Using methods of analytical combinatorics, that depend on complex analysis, we prove that the asymptotic expected 5'-3' distance of length n homopolymers is approximately equal to the constant 5.47211, while the asymptotic distance is 6.771096 if hairpins have a minimum of 3 unpaired bases and the probability that any two positions can form a base pair is 1/4. Finally, we analyze the 5'-3' distance for secondary structures from the STRAND database, and conclude that the 5'-3' distance is correlated with RNA sequence length

    The ends of a large RNA molecule are necessarily close

    Get PDF
    We show on general theoretical grounds that the two ends of single-stranded (ss) RNA molecules (consisting of roughly equal proportions of A, C, G and U) are necessarily close together, largely independent of their length and sequence. This is demonstrated to be a direct consequence of two generic properties of the equilibrium secondary structures, namely that the average proportion of bases in pairs is ∼60% and that the average duplex length is ∼4. Based on mfold and Vienna computations on large numbers of ssRNAs of various lengths (1000–10 000 nt) and sequences (both random and biological), we find that the 5′–3′ distance—defined as the sum of H-bond and covalent (ss) links separating the ends of the RNA chain—is small, averaging 15–20 for each set of viral sequences tested. For random sequences this distance is ∼12, consistent with the theory. We discuss the relevance of these results to evolved sequence complementarity and specific protein binding effects that are known to be important for keeping the two ends of viral and messenger RNAs in close proximity. Finally we speculate on how our conclusions imply indistinguishability in size and shape of equilibrated forms of linear and covalently circularized ssRNA molecules

    Is There a Classical Nonsense-Mediated Decay Pathway in Trypanosomes?

    Get PDF
    In many eukaryotes, messenger RNAs with premature termination codons are destroyed by a process called “nonsense-mediated decay”, which requires the RNA helicase Upf1 and also, usually, an interacting factor, Upf2. Recognition of premature termination codons may rely on their distance from either a splice site or the polyadenylation site, and long 3′-untranslated regions can trigger mRNA decay. The protist Trypanosoma brucei relies heavily on mRNA degradation to determine mRNA levels, and 3′-untranslated regions play a major role in control of mRNA decay. We show here that trypanosomes have a homologue of Upf1, TbUPF1, which interacts with TbUPF2 and (in an RNA-dependent fashion) with poly(A) binding protein 1, PABP1. Introduction of a premature termination codon in either an endogenous gene or a reporter gene decreased mRNA abundance, as expected for nonsense-mediated decay, but a dependence of this effect on TbUPF1 could not be demonstrated, and depletion of TbUPF1 by over 95% had no effect on parasite growth or the mRNA transcriptome. Further investigations of the reporter mRNA revealed that increases in open reading frame length tended to increase mRNA abundance. In contrast, inhibition of translation, either using 5′-secondary structures or by lengthening the 5′-untranslated region, usually decreased reporter mRNA abundance. Meanwhile, changing the length of the 3′-untranslated region had no consistent effect on mRNA abundance. We suggest that in trypanosomes, translation per se may inhibit mRNA decay, and interactions with multiple RNA-binding proteins preclude degradation based on 3′-untranslated region length alone

    Computational Methods For Comparative Non-coding Rna Analysis: From Structural Motif Identification To Genome-wide Functional Classification

    Get PDF
    Recent advances in biological research point out that many ribonucleic acids (RNAs) are transcribed from the genome to perform a variety of cellular functions, rather than merely acting as information carriers for protein synthesis. These RNAs are usually referred to as the non-coding RNAs (ncRNAs). The versatile regulation mechanisms and functionalities of the ncRNAs contribute to the amazing complexity of the biological system. The ncRNAs perform their biological functions by folding into specific structures. In this case, the comparative study of the ncRNA structures is key to the inference of their molecular and cellular functions. We are especially interested in two computational problems for the comparative analysis of ncRNA structures: the alignment of ncRNA structures and their classification. Specifically, we aim to develop algorithms to align and cluster RNA structural motifs (recurrent RNA 3D fragments), as well as RNA secondary structures. Thorough understanding of RNA structural motifs will help us to disassemble the huge RNA 3D structures into functional modules, which can significantly facilitate the analysis of the detailed molecular functions. On the other hand, efficient alignment and clustering of the RNA secondary structures will provide insights for the understanding of the ncRNA expression and interaction in a genomic scale. In this dissertation, we will present a suite of computational algorithms and software packages to solve the RNA structural motif alignment and clustering problem, as well as the RNA iii secondary structure alignment and clustering problem. The summary of the contributions of this dissertation is as follows. (1) We developed RNAMotifScan for comparing and searching RNA structural motifs. Recent studies have shown that RNA structural motifs play an essential role in RNA folding and interaction with other molecules. Computational identification and analysis of RNA structural motifs remain to be challenging tasks. Existing motif identification methods based on 3D structure may not properly compare motifs with high structural variations. We present a novel RNA structural alignment method for RNA structural motif identi- fication, RNAMotifScan, which takes into consideration the isosteric (both canonical and non-canonical) base-pairs and multi-pairings in RNA structural motifs. The utility and accuracy of RNAMotifScan are demonstrated by searching for Kink-turn, C-loop, Sarcin-ricin, Reverse Kink-turn and E-loop motifs against a 23s rRNA (PDBid: 1S72), which is well characterized for the occurrences of these motifs. (2) We improved upon RNAMotifScan by incorporating base-stacking information and devising a new branch-and-bound algorithm called RNAMotifScanX. Model-based search of RNA structural motif has been focused on finding instances with similar 3D geometry and base-pairing patterns. Although these methods have successfully identified many of the true motif instances, each of them has its own limitations and their accuracy and sensitivity can be further improved. We introduce a novel approach to model the RNA structural motifs, which incorporates both base-pairing and base-stacking information. We also develop a new algorithm to search for known motif instances with the consideration of both base-pairing and base-stacking information. Benchmarking of RNAMotifScanX on searching known RNA structural motifs including kink-turn, C-loop, sarcin-ricin, reverse kink-turn, and E-loop iv clearly show improved performances compared to its predecessor RNAMotifScan and other state-of-the-art RNA structural motif search tools. (3) We develop an RNA structural motif clustering and de novo identification pipeline called RNAMSC. RNA structural motifs are the building blocks of the complex RNA architecture. Identification of non-coding RNA structural motifs is a critical step towards understanding of their structures and functionalities. We present a clustering approach for de novo RNA structural motif identification. We applied our approach on a data set containing 5S, 16S and 23S rRNAs and rediscovered many known motifs including GNRA tetraloop, kink-turn, C-loop, sarcin-ricin, reverse kink-turn, hook-turn, E-loop and tandem-sheared motifs, with higher accuracy than the currently state-of-the-art clustering method. More importantly, several novel structural motif families have been revealed by our novel clustering analysis. (4) We propose an improved RNA structural clustering pipeline that takes into account the length-dependent distribution of the structural similarity measure. We also devise a more efficient and robust CLique finding CLustering algorithm (CLCL), to replace the traditional hierarchical clustering approach. Benchmark of the proposed pipeline on Rfam data clearly demonstrates over 10% performance gain, when compared to a traditional hierarchical clustering pipeline. We applied this new computational pipeline to cluster the posttranscriptional control elements in fly 3’-UTR. The ncRNA elements in the 3’ untranslated regions (3’-UTRs) are known to participate in the genes’ post-transcriptional regulation, such as their stability, translation efficiency, and subcellular localization. Inferring co-expression patterns of the genes by clustering their 3’-UTR ncRNA elements will provide invaluable knowledge for further studies of their functionalities and interactions under specific physiological processes. v (5) We develop an ultra-efficient RNA secondary structure alignment algorithm ERA by using a sparse dynamic programming technique. Current advances of the next-generation sequencing technology have revealed a large number of un-annotated RNA transcripts. Comparative study of the RNA structurome is an important approach to assess the biological functionalities of these RNA transcripts. Due to the large sizes and abundance of the RNA transcripts, an efficient and accurate RNA structure-structure alignment algorithm is in urgent need to facilitate the comparative study. By using the sparse dynamic programming technique, we devised a new alignment algorithm that is as efficient as the tree-based alignment algorithms, and as accurate as the general edit-distance alignment algorithms. We implemented the new algorithm into a program called ERA (Efficient RNA Alignment). Benchmark results indicate that ERA can significantly speedup RNA structure-structure alignments compared to other state-of-the-art RNA alignment tools, while maintaining high alignment accuracy. These novel algorithms have led to the discovery of many novel RNA structural motif instances, which have significantly deepened our understanding to the RNA molecular functions. The genome-wide clustering of ncRNA elements in fly 3’-UTR has predicted a cluster of genes that are responsible for the spermatogenesis process. More importantly, these genes are very likely to be co-regulated by their common 3’-UTR elements. We anticipate that these algorithms and the corresponding software tools will significantly promote the comparative ncRNA research in the futur
    corecore