2,879 research outputs found

    PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences

    Get PDF
    Motivation: Predicting RNA–RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA–RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA–RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences

    The PETfold and PETcofold web servers for intra- and intermolecular structures of multiple RNA sequences

    Get PDF
    The function of non-coding RNA genes largely depends on their secondary structure and the interaction with other molecules. Thus, an accurate prediction of secondary structure and RNA–RNA interaction is essential for the understanding of biological roles and pathways associated with a specific RNA gene. We present web servers to analyze multiple RNA sequences for common RNA structure and for RNA interaction sites. The web servers are based on the recent PET (Probabilistic Evolutionary and Thermodynamic) models PETfold and PETcofold, but add user friendly features ranging from a graphical layer to interactive usage of the predictors. Additionally, the web servers provide direct access to annotated RNA alignments, such as the Rfam 10.0 database and multiple alignments of 16 vertebrate genomes with human. The web servers are freely available at: http://rth.dk/resources/petfold

    RNA-RNA interaction prediction based on multiple sequence alignments

    Full text link
    Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, O(N6)O(N^6) time and O(N4)O(N^4) space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{[email protected]}Comment: 8 pages, 9 figure

    RNA structure analysis : algorithms and applications

    Get PDF
    In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure. Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree model to capture the structural particularities. The novel structure alignment algorithm is implemented using dynamic programming techniques coupled by position-independent scoring matrices. The algorithm can find the optimal global and local alignments between two RNA secondary structures at quadratic time complexity. When applied to searching a structure database, the algorithm can find similar RNA substructures and therefore can be used to identify functional RNA motifs. Extension of the algorithm has also been accomplished to deal with position-dependent scoring matrix in the purpose of aligning multiple structures. All algorithms have been implemented in a package under the name RSmatch and applied to searching mRNA UTR structure database and mining RNA motifs. The experimental results showed high efficiency and effectiveness of the proposed techniques

    Computational Methods for Comparative Non-coding RNA Analysis: from Secondary Structures to Tertiary Structures

    Get PDF
    Unlike message RNAs (mRNAs) whose information is encoded in the primary sequences, the cellular roles of non-coding RNAs (ncRNAs) originate from the structures. Therefore studying the structural conservation in ncRNAs is important to yield an in-depth understanding of their functionalities. In the past years, many computational methods have been proposed to analyze the common structural patterns in ncRNAs using comparative methods. However, the RNA structural comparison is not a trivial task, and the existing approaches still have numerous issues in efficiency and accuracy. In this dissertation, we will introduce a suite of novel computational tools that extend the classic models for ncRNA secondary and tertiary structure comparisons. For RNA secondary structure analysis, we first developed a computational tool, named PhyloRNAalifold, to integrate the phylogenetic information into the consensus structural folding. The underlying idea of this algorithm is that the importance of a co-varying mutation should be determined by its position on the phylogenetic tree. By assigning high scores to the critical covariances, the prediction of RNA secondary structure can be more accurate. Besides structure prediction, we also developed a computational tool, named ProbeAlign, to improve the efficiency of genome-wide ncRNA screening by using high-throughput RNA structural probing data. It treats the chemical reactivities embedded in the probing information as pairing attributes of the searching targets. This approach can avoid the time-consuming base pair matching in the secondary structure alignment. The application of ProbeAlign to the FragSeq datasets shows its capability of genome-wide ncRNAs analysis. For RNA tertiary structure analysis, we first developed a computational tool, named STAR3D, to find the global conservation in RNA 3D structures. STAR3D aims at finding the consensus of stacks by using 2D topology and 3D geometry together. Then, the loop regions can be ordered and aligned according to their relative positions in the consensus. This stack-guided alignment method adopts the divide-and-conquer strategy into RNA 3D structural alignment, which has improved its efficiency dramatically. Furthermore, we also have clustered all loop regions in non-redundant RNA 3D structures to de novo detect plausible RNA structural motifs. The computational pipeline, named RNAMSC, was extended to handle large-scale PDB datasets, and solid downstream analysis was performed to ensure the clustering results are valid and easily to be applied to further research. The final results contain many interesting variations of known motifs, such as GNAA tetraloop, kink-turn, sarcin-ricin and t-loops. We also discovered novel functional motifs that conserved in a wide range of ncRNAs, including ribosomal RNA, sgRNA, SRP RNA, GlmS riboswitch and twister ribozyme

    A bioinformatics framework for RNA structure mining, motif discovery and polyadenylation analysis

    Get PDF
    The RNA molecules play various important roles in the cell and their functionality depends not only on the sequence information but to a large extent on their structure. The development of computational and predictive approaches to study RNA molecules is extremely valuable. In this research, a tool named RADAR was developed that provides a multitude of functionality for RNA data analysis and research. It aligns structure annotated RNA sequences so that both the sequence as well as structure information is taken into consideration. This tool is capable of performing pair-wise structure alignment, multiple structure alignment, database search and clustering. In addition, it provides two salient features: (i) constrained alignment of RNA secondary structures, and (ii) prediction of consensus structure for a set of RNA sequences. This tool is also hosted on the web and can be freely accessed and the software can be downloaded from http://datalab.njitedu/biodata/rna/RSmatch/server.htm . The RADAR software has been applied to various datasets (genomes of various mammals, viruses and parasites) and our experimental results show that this approach is capable of detecting functionally important regions. As an application of RADAR, a systematic data mining approach was developed, termed GLEAN-UTR, to identify small stem loop RNA structure elements in the Untranslated regions (UTRs) that are conserved between human and mouse orthologs and exist in multiple genes with common Gene Ontology terms. This study resulted in 90 distinct RNA structure groups containing 748 structures, with 3\u27 Histone stem loop (HSL3) and Iron Response element (IRE) among the top hits. Further, the role played by structure in mRNA polyadenylation was investigated. Polyadenylation is an important step towards the maturation of almost all cellular mRNAs in eukaryotes. Studies have identified several cis-elements besides the widely known polyadenylation signal (PAS) element (AATAAA or ATTAAA or a close variant) which may have a role to play in poly(A) site identification. In this study the differences in structural stability of sequences surrounding poly(A) sites was investigated and it was found that for the genes containing single poly(A) site, the surrounding sequence is most stable as compared with the surrounding sequences for alternative poly(A) sites. This indicates that structure may be providing a evolutionary advantage for single poly(A) sites that prevents multiple poly(A) sites from arising. In addition the study found that the structural stability of the region surrounding a polyadenylation site correlates with its distance from the next gene. The shortest distance corresponding to a greater structural stability

    Computational Identification of Four Spliceosomal snRNAs from the Deep-Branching Eukaryote Giardia intestinalis

    Get PDF
    Funding: Marsden Fund New Zealand Allan Wilson Centre The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.RNAs processing other RNAs is very general in eukaryotes, but is not clear to what extent it is ancestral to eukaryotes. Here we focus on pre-mRNA splicing, one of the most important RNA-processing mechanisms in eukaryotes. In most eukaryotes splicing is predominantly catalysed by the major spliceosome complex, which consists of five uridine-rich small nuclear RNAs (U-snRNAs) and over 200 proteins in humans. Three major spliceosomal introns have been found experimentally in Giardia; one Giardia U-snRNA (U5) and a number of spliceosomal proteins have also been identified. However, because of the low sequence similarity between the Giardia ncRNAs and those of other eukaryotes, the other U-snRNAs of Giardia had not been found. Using two computational methods, candidates for Giardia U1, U2, U4 and U6 snRNAs were identified in this study and shown by RT-PCR to be expressed. We found that identifying a U2 candidate helped identify U6 and U4 based on interactions between them. Secondary structural modelling of the Giardia U-snRNA candidates revealed typical features of eukaryotic U-snRNAs. We demonstrate a successful approach to combine computational and experimental methods to identify expected ncRNAs in a highly divergent protist genome. Our findings reinforce the conclusion that spliceosomal small-nuclear RNAs existed in the last common ancestor of eukaryotes

    Computational Methods For Comparative Non-coding Rna Analysis: From Structural Motif Identification To Genome-wide Functional Classification

    Get PDF
    Recent advances in biological research point out that many ribonucleic acids (RNAs) are transcribed from the genome to perform a variety of cellular functions, rather than merely acting as information carriers for protein synthesis. These RNAs are usually referred to as the non-coding RNAs (ncRNAs). The versatile regulation mechanisms and functionalities of the ncRNAs contribute to the amazing complexity of the biological system. The ncRNAs perform their biological functions by folding into specific structures. In this case, the comparative study of the ncRNA structures is key to the inference of their molecular and cellular functions. We are especially interested in two computational problems for the comparative analysis of ncRNA structures: the alignment of ncRNA structures and their classification. Specifically, we aim to develop algorithms to align and cluster RNA structural motifs (recurrent RNA 3D fragments), as well as RNA secondary structures. Thorough understanding of RNA structural motifs will help us to disassemble the huge RNA 3D structures into functional modules, which can significantly facilitate the analysis of the detailed molecular functions. On the other hand, efficient alignment and clustering of the RNA secondary structures will provide insights for the understanding of the ncRNA expression and interaction in a genomic scale. In this dissertation, we will present a suite of computational algorithms and software packages to solve the RNA structural motif alignment and clustering problem, as well as the RNA iii secondary structure alignment and clustering problem. The summary of the contributions of this dissertation is as follows. (1) We developed RNAMotifScan for comparing and searching RNA structural motifs. Recent studies have shown that RNA structural motifs play an essential role in RNA folding and interaction with other molecules. Computational identification and analysis of RNA structural motifs remain to be challenging tasks. Existing motif identification methods based on 3D structure may not properly compare motifs with high structural variations. We present a novel RNA structural alignment method for RNA structural motif identi- fication, RNAMotifScan, which takes into consideration the isosteric (both canonical and non-canonical) base-pairs and multi-pairings in RNA structural motifs. The utility and accuracy of RNAMotifScan are demonstrated by searching for Kink-turn, C-loop, Sarcin-ricin, Reverse Kink-turn and E-loop motifs against a 23s rRNA (PDBid: 1S72), which is well characterized for the occurrences of these motifs. (2) We improved upon RNAMotifScan by incorporating base-stacking information and devising a new branch-and-bound algorithm called RNAMotifScanX. Model-based search of RNA structural motif has been focused on finding instances with similar 3D geometry and base-pairing patterns. Although these methods have successfully identified many of the true motif instances, each of them has its own limitations and their accuracy and sensitivity can be further improved. We introduce a novel approach to model the RNA structural motifs, which incorporates both base-pairing and base-stacking information. We also develop a new algorithm to search for known motif instances with the consideration of both base-pairing and base-stacking information. Benchmarking of RNAMotifScanX on searching known RNA structural motifs including kink-turn, C-loop, sarcin-ricin, reverse kink-turn, and E-loop iv clearly show improved performances compared to its predecessor RNAMotifScan and other state-of-the-art RNA structural motif search tools. (3) We develop an RNA structural motif clustering and de novo identification pipeline called RNAMSC. RNA structural motifs are the building blocks of the complex RNA architecture. Identification of non-coding RNA structural motifs is a critical step towards understanding of their structures and functionalities. We present a clustering approach for de novo RNA structural motif identification. We applied our approach on a data set containing 5S, 16S and 23S rRNAs and rediscovered many known motifs including GNRA tetraloop, kink-turn, C-loop, sarcin-ricin, reverse kink-turn, hook-turn, E-loop and tandem-sheared motifs, with higher accuracy than the currently state-of-the-art clustering method. More importantly, several novel structural motif families have been revealed by our novel clustering analysis. (4) We propose an improved RNA structural clustering pipeline that takes into account the length-dependent distribution of the structural similarity measure. We also devise a more efficient and robust CLique finding CLustering algorithm (CLCL), to replace the traditional hierarchical clustering approach. Benchmark of the proposed pipeline on Rfam data clearly demonstrates over 10% performance gain, when compared to a traditional hierarchical clustering pipeline. We applied this new computational pipeline to cluster the posttranscriptional control elements in fly 3’-UTR. The ncRNA elements in the 3’ untranslated regions (3’-UTRs) are known to participate in the genes’ post-transcriptional regulation, such as their stability, translation efficiency, and subcellular localization. Inferring co-expression patterns of the genes by clustering their 3’-UTR ncRNA elements will provide invaluable knowledge for further studies of their functionalities and interactions under specific physiological processes. v (5) We develop an ultra-efficient RNA secondary structure alignment algorithm ERA by using a sparse dynamic programming technique. Current advances of the next-generation sequencing technology have revealed a large number of un-annotated RNA transcripts. Comparative study of the RNA structurome is an important approach to assess the biological functionalities of these RNA transcripts. Due to the large sizes and abundance of the RNA transcripts, an efficient and accurate RNA structure-structure alignment algorithm is in urgent need to facilitate the comparative study. By using the sparse dynamic programming technique, we devised a new alignment algorithm that is as efficient as the tree-based alignment algorithms, and as accurate as the general edit-distance alignment algorithms. We implemented the new algorithm into a program called ERA (Efficient RNA Alignment). Benchmark results indicate that ERA can significantly speedup RNA structure-structure alignments compared to other state-of-the-art RNA alignment tools, while maintaining high alignment accuracy. These novel algorithms have led to the discovery of many novel RNA structural motif instances, which have significantly deepened our understanding to the RNA molecular functions. The genome-wide clustering of ncRNA elements in fly 3’-UTR has predicted a cluster of genes that are responsible for the spermatogenesis process. More importantly, these genes are very likely to be co-regulated by their common 3’-UTR elements. We anticipate that these algorithms and the corresponding software tools will significantly promote the comparative ncRNA research in the futur
    corecore