21 research outputs found

    Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results

    Get PDF
    Systematic research on noncoding RNAs (ncRNAs) has revealed that many ncRNAs are actively involved in various biological networks. Therefore, in order to fully understand the mechanisms of these networks, it is crucial to understand the roles of ncRNAs. Unfortunately, the annotation of ncRNA genes that give rise to functional RNA molecules has begun only recently, and it is far from being complete. Considering the huge amount of genome sequence data, we need efficient computational methods for finding ncRNA genes. One effective way of finding ncRNA genes is to look for regions that are similar to known ncRNA genes. As many ncRNAs have well-conserved secondary structures, we need statistical models that can represent such structures for this purpose. In this paper, we propose a new method for representing RNA sequence profiles and finding structural alignment of RNAs based on profile context-sensitive hidden Markov models (profile-csHMMs). Unlike existing models, the proposed approach can handle any kind of RNA secondary structures, including pseudoknots. We show that profile-csHMMs can provide an effective framework for the computational analysis of RNAs and the identification of ncRNA genes

    Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

    Get PDF
    The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

    A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots

    Get PDF
    This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms

    Homology ­modeling of complex structural RNAs

    Get PDF
    National audienceAligning macromolecules such as proteins, DNAs and RNAs in order to reveal, or conversely exploit, their functional homology is a classic challenge in bioinformatics, with far­reaching applications in structure modelling and genome annotations. In the specific context of complex RNAs, featuring pseudoknots, multiple interactions and non­canonical base pairs, multiple algorithmic solutions and tools have been proposed for the structure/sequence alignment problem. However, such tools are seldom used in practice, due in part to their extreme computational demands, and because of their inability to support general types of structures. Recently, a general parameterized algorithm based on tree decomposition of the query structure has been designed by Rinaudo et al. We present an implementation of the algorithm within a tool named LiCoRNA. We compare it against state­of­the­art algorithms. We show that it both gracefully specializes into a practical algorithm for simple classes pseudoknot, and offers a general solution for complex pseudoknots, which are explicitly out­of­reach of competing softwares

    Prediction of RNA secondary structure with pseudoknots using integer programming

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA secondary structure prediction is one major task in bioinformatics, and various computational methods have been proposed so far. Pseudoknot is one of the typical substructures appearing in several RNAs, and plays an important role in some biological processes. Prediction of RNA secondary structure with pseudoknots is still challenging since the problem is NP-hard when arbitrary pseudoknots are taken into consideration.</p> <p>Results</p> <p>We introduce a new method of predicting RNA secondary structure with pseudoknots based on integer programming. In our formulation, we aim at minimizing the value of the objective function that reflects free energy of a folding structure of an input RNA sequence. We focus on a practical class of pseudoknots by setting constraints appropriately. Experimental results for a set of real RNA sequences show that our proposed method outperforms several existing methods in sensitivity. Furthermore, for a set of sequences of small length, our approach achieved good performance in both sensitivity and specificity.</p> <p>Conclusion</p> <p>Our integer programming-based approach for RNA structure prediction is flexible and extensible.</p

    On the combinatorics of sparsification

    Get PDF
    Background: We study the sparsification of dynamic programming folding algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA structures and can lead to a significant reduction of time complexity. Results: We analyze the sparsification of a particular decomposition rule, Λ\Lambda^*, that splits an interval for RNA secondary and pseudoknot structures of fixed topological genus. Essential for quantifying the sparsification is the size of its so called candidate set. We present a combinatorial framework which allows by means of probabilities of irreducible substructures to obtain the expected size of the set of Λ\Lambda^*-candidates. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) for RNA secondary structures as well as RNA pseudoknot structures. For RNA secondary structures we also consider a simplified loop-energy model. This combinatorial analysis is then compared to the expected number of Λ\Lambda^*-candidates obtained from folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop energy model our results imply that sparsification provides a reduction of time complexity by a constant factor of 91% (theory) versus a 96% reduction (experiment). For the "full" loop-energy model there is a reduction of 98% (experiment).Comment: 27 pages, 12 figure

    Efficient known ncRNA search including pseudoknots

    Get PDF
    BACKGROUND: Searching for members of characterized ncRNA families containing pseudoknots is an important component of genome-scale ncRNA annotation. However, the state-of-the-art known ncRNA search is based on context-free grammar (CFG), which cannot effectively model pseudoknots. Thus, existing CFG-based ncRNA identification tools usually ignore pseudoknots during search. As a result, dozens of sequences that do not contain the native pseudoknots are reported by these tools. When pseudoknot structures are vital to the functions of the ncRNAs, these sequences may not be true members. RESULTS: In this work, we design a pseudoknot search tool using multiple simple sub-structures, which are derived from knot-free and bifurcation-free structural motifs in the underlying family. We test our tool on a contiguous 22-Mb region of the Maize Genome. The experimental results show that our work competes favorably with other pseudoknot search methods. CONCLUSIONS: Our sub-structure based tool can conduct genome-scale pseudoknot-containing ncRNA search effectively and efficiently. It provides a complementary pseudoknot search tool to Infernal. The source codes are available at http://www.cse.msu.edu/~chengy/knotsearch

    Automatic generation of pseudoknotted RNAs taxonomy

    Get PDF
    Background: The ability to compare RNA secondary structures is important in understanding their biological function and for grouping similar organisms into families by looking at evolutionarily conserved sequences such as 16S rRNA. Most comparison methods and benchmarks in the literature focus on pseudoknot-free structures due to the difficulty of mapping pseudoknots in classical tree representations. Some approaches exist that permit to cluster pseudoknotted RNAs but there is not a general framework for evaluating their performance. Results: We introduce an evaluation framework based on a similarity/dissimilarity measure obtained by a comparison method and agglomerative clustering. Their combination automatically partition a set of molecules into groups. To illustrate the framework we define and make available a benchmark of pseudoknotted (16S and 23S) and pseudoknot-free (5S) rRNA secondary structures belonging to Archaea, Bacteria and Eukaryota. We also consider five different comparison methods from the literature that are able to manage pseudoknots. For each method we clusterize the molecules in the benchmark to obtain the taxa at the rank phylum according to the European Nucleotide Archive curated taxonomy. We compute appropriate metrics for each method and we compare their suitability to reconstruct the taxa

    Software.ncrna.org: web servers for analyses of RNA sequences

    Get PDF
    We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement

    ANALYZING THE AMBIGUITY IN RNA STRUCTURE USING PROBABILISTIC APPROACH

    Get PDF
    ABSTRACT: RNA is the second major form of nucleic acid in human cells that play intermediary role between DNA and functional protein. Several classes of RNA&apos;s are found in cells, each with distinct function. Understanding of storage and utilization of a cell&apos;s genetic information is based on the structure of RNA. Many experimental results have shown that RNA plays a greater role in the cells. RNA sequences contains signals at the structure level can be exploited to detect functional motifs common to all or a portion of those sequence. Different types of analysis of a structure can provide functional information in different degrees of detail. In this paper various types of RNA secondary structure representation has been discussed and in which appropriate structure has been adopted for probabilistic approach that shows un-ambiguity
    corecore