69,349 research outputs found

    Efficient Algorithms for Local Forest Similarity

    Get PDF
    An ordered labelled tree is a tree where the left-to-right order among siblings is significant. Ordered labelled forests are sequences of ordered labelled trees. Given two ordered labelled forests F and G. the local forest similarity is to find two sub­ forests F\u27 and G\u27 of F and G respectively such that they are the most similar over all possible F\u27 and G\u27. In this thesis, we present efficient algorithms for the local forest similarity problem for two types of sub-forests: sibling subforests and closed subforests. Our algorithms can be used to locate the structural regions in RNA secondary structures since RNA molecules’ secondary structures could be represented as ordered labelled forests

    Efficient algorithms for local forest similarity and forest pattern matching

    Get PDF
    Ordered labelled trees are trees where each node has a label and the left-to-right order among siblings is significant. Ordered labelled forests are sequences of ordered labelled trees. Ordered labelled trees and forests are useful structures for hierarchical data representation. Given two ordered labelled forests F and G, the local forest similarity is to compute two sub-forests F\u27 and G\u27 of F and G respectively such that they are the most similar over all the possible F\u27 and G\u27. Given a target forest F and a pattern forest G, the forest pattern matching problem is to compute a sub-forest F\u27 of F which is the most similar to G over all the possible F\u27. This thesis presents novel efficient algorithms for the local forest similarity problem and forest pattern matching problem for sub-forest. An application of the algorithms is that it can be used to locate the structural regions in RNA secondary structures which is the necessity data in RNA secondary structure prediction and function investigation. RNA is a chain molecular, mathematically it is a string over a four letter alphabet; in computational molecular biology, labeled ordered trees are used to represent RNA secondary structures

    Web-Beagle: a web server for the alignment of RNA secondary structures

    Get PDF
    Web-Beagle (http://beagle.bio.uniroma2.it) is a web server for the pairwise global or local alignment of RNA secondary structures. The server exploits a new encoding for RNA secondary structure and a substitution matrix of RNA structural elements to perform RNA structural alignments. The web server allows the user to compute up to 10 000 alignments in a single run, taking as input sets of RNA sequences and structures or primary sequences alone. In the latter case, the server computes the secondary structure prediction for the RNAs on-the-fly using RNAfold (free energy minimization). The user can also compare a set of input RNAs to one of five pre-compiled RNA datasets including lncRNAs and 3' UTRs. All types of comparison produce in output the pairwise alignments along with structural similarity and statistical significance measures for each resulting alignment. A graphical color-coded representation of the alignments allows the user to easily identify structural similarities between RNAs. Web-Beagle can be used for finding structurally related regions in two or more RNAs, for the identification of homologous regions or for functional annotation. Benchmark tests show that Web-Beagle has lower computational complexity, running time and better performances than other available methods

    Local Similarity Between Quotiented Ordered Trees

    No full text
    International audienceIn this paper we propose a dynamic programming algorithm to evaluate local similarity between ordered quotiented trees using a constrained edit scoring scheme. A quotiented tree is a tree defined with an additional equivalent relation on vertices and such that the quotient graph is also a tree. The core of the method relies on two adaptations of an algorithm proposed by Zhang et al. [K. Zhang, D. Shasha, Simple fast algorithms for the editing distance between trees and related problems (1989) 1245-1262] for comparing ordered rooted trees. After some preliminary definitions and the description of this tree edit algorithm, we propose extensions to globally and locally compare two quotiented trees. This last method allows to find the region in each tree with the highest similarity. Algorithms are currently being used in genomic analysis to evaluate variability between RNA secondary structures

    Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results

    Get PDF
    Systematic research on noncoding RNAs (ncRNAs) has revealed that many ncRNAs are actively involved in various biological networks. Therefore, in order to fully understand the mechanisms of these networks, it is crucial to understand the roles of ncRNAs. Unfortunately, the annotation of ncRNA genes that give rise to functional RNA molecules has begun only recently, and it is far from being complete. Considering the huge amount of genome sequence data, we need efficient computational methods for finding ncRNA genes. One effective way of finding ncRNA genes is to look for regions that are similar to known ncRNA genes. As many ncRNAs have well-conserved secondary structures, we need statistical models that can represent such structures for this purpose. In this paper, we propose a new method for representing RNA sequence profiles and finding structural alignment of RNAs based on profile context-sensitive hidden Markov models (profile-csHMMs). Unlike existing models, the proposed approach can handle any kind of RNA secondary structures, including pseudoknots. We show that profile-csHMMs can provide an effective framework for the computational analysis of RNAs and the identification of ncRNA genes

    Novel algorithms to analyze RNA secondary structure evolution and folding kinetics

    Get PDF
    Thesis advisor: Peter CloteRNA molecules play important roles in living organisms, such as protein translation, gene regulation, and RNA processing. It is known that RNA secondary structure is a scaffold for tertiary structure leading to extensive amount of interest in RNA secondary structure. This thesis is primarily focused on the development of novel algorithms for the analysis of RNA secondary structure evolution and folding kinetics. We describe a software RNAsampleCDS to generate mRNA sequences coding user-specified peptides overlapping in up to six open reading frames. Sampled mRNAs are then analyzed with other tools to provide an estimate of their secondary structure properties. We investigate homology of RNAs with respect to both sequence and secondary structure information as well. RNAmountAlign an efficient software package for multiple global, local, and semiglobal alignment of RNAs using a weighted combination of sequence and structural similarity with statistical support is presented. Furthermore, we approach RNA folding kinetics from a novel network perspective, presenting algorithms for the shortest path and expected degree of nodes in the network of all secondary structures of an RNA. In these algorithms we consider move set MS2 , allowing addition, removal and shift of base pairs used by several widely-used RNA secondary structure folding kinetics software that implement Gillespie’s algorithm. We describe MS2distance software to compute the shortest MS2 folding trajectory between any two given RNA secondary structures. Moreover, RNAdegree software implements the first algorithm to efficiently compute the expected degree of an RNA MS2 network of secondary structures. The source code for all the software and webservers for RNAmountAlign, MS2distance, and RNAdegree are publicly available at http://bioinformatics.bc.edu/clotelab/.Thesis (PhD) — Boston College, 2018.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Biology

    Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

    Get PDF
    The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]
    • …
    corecore