10 research outputs found

    The ARTS web server for aligning RNA tertiary structures

    Get PDF
    RNA molecules with common structural features may share similar functional properties. Structural comparison of RNAs and detection of common substructures is, thus, a highly important task. Nevertheless, the current available tools in the RNA community provide only a partial solution, since they either work at the 2D level or are suitable for detecting predefined or local contiguous tertiary motifs only. Here, we describe a web server built around ARTS, a method for aligning tertiary structures of nucleic acids (both RNA and DNA). ARTS receives a pair of 3D nucleic acid structures and searches for a priori unknown common substructures. The search is truly 3D and irrespective of the order of the nucleotides on the chain. The identified common substructures can be large global folds with hundreds and even thousands of nucleotides as well as small local motifs with at least two successive base pairs. The method is highly efficient and has been used to conduct an all-against-all comparison of all the RNA structures in the Protein Data Bank. The web server together with a software package for download are freely accessible at

    Alignment-free local structural search by writhe decomposition

    Get PDF
    Motivation: Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities

    Computational Methods for Comparative Non-coding RNA Analysis: from Secondary Structures to Tertiary Structures

    Get PDF
    Unlike message RNAs (mRNAs) whose information is encoded in the primary sequences, the cellular roles of non-coding RNAs (ncRNAs) originate from the structures. Therefore studying the structural conservation in ncRNAs is important to yield an in-depth understanding of their functionalities. In the past years, many computational methods have been proposed to analyze the common structural patterns in ncRNAs using comparative methods. However, the RNA structural comparison is not a trivial task, and the existing approaches still have numerous issues in efficiency and accuracy. In this dissertation, we will introduce a suite of novel computational tools that extend the classic models for ncRNA secondary and tertiary structure comparisons. For RNA secondary structure analysis, we first developed a computational tool, named PhyloRNAalifold, to integrate the phylogenetic information into the consensus structural folding. The underlying idea of this algorithm is that the importance of a co-varying mutation should be determined by its position on the phylogenetic tree. By assigning high scores to the critical covariances, the prediction of RNA secondary structure can be more accurate. Besides structure prediction, we also developed a computational tool, named ProbeAlign, to improve the efficiency of genome-wide ncRNA screening by using high-throughput RNA structural probing data. It treats the chemical reactivities embedded in the probing information as pairing attributes of the searching targets. This approach can avoid the time-consuming base pair matching in the secondary structure alignment. The application of ProbeAlign to the FragSeq datasets shows its capability of genome-wide ncRNAs analysis. For RNA tertiary structure analysis, we first developed a computational tool, named STAR3D, to find the global conservation in RNA 3D structures. STAR3D aims at finding the consensus of stacks by using 2D topology and 3D geometry together. Then, the loop regions can be ordered and aligned according to their relative positions in the consensus. This stack-guided alignment method adopts the divide-and-conquer strategy into RNA 3D structural alignment, which has improved its efficiency dramatically. Furthermore, we also have clustered all loop regions in non-redundant RNA 3D structures to de novo detect plausible RNA structural motifs. The computational pipeline, named RNAMSC, was extended to handle large-scale PDB datasets, and solid downstream analysis was performed to ensure the clustering results are valid and easily to be applied to further research. The final results contain many interesting variations of known motifs, such as GNAA tetraloop, kink-turn, sarcin-ricin and t-loops. We also discovered novel functional motifs that conserved in a wide range of ncRNAs, including ribosomal RNA, sgRNA, SRP RNA, GlmS riboswitch and twister ribozyme

    Weighted graph matching approaches to structure comparison and alignment and their application to biological problems

    Get PDF
    In pattern recognition and machine learning, comparing and contrasting are the most fundamental operations: from similarities we derive common rules encoded in the systems, while from difference we infer what makes each system unique. The biological sciences are not an exception to these operations and, in fact, rely heavily on their use. More recently, the emergence of high-throughput measurement technologies has highlighted the need for novel approaches capable of enhancing our ability to understand complex relationships in these data sets. Often, these relationships can be best represented using graphs (or networks), where nodes are biochemical components such as genes, RNAs, proteins or metabolites, and edges indicate the types (and often quality) of relationship. Comparison of relationships is generally performed by aligning the networks of interest. For example, for protein-protein interaction (PPI) networks, the goal of network alignment is to find mappings between nodes (proteins) which are highly useful in identifying signaling pathways or protein complexes and to annotate genes of unknown functionality from subnetworks conserved across multiple species. Phylogenetic trees are also graph structures that describe evolutionary relationship among groups of organisms and their hypothetical ancestors. As it has been shown in a large volume of previous work, comparison of trees also opens the possibility of supporting or building new evolutionary hypotheses: for example, the detection of host-parasite symbiosis, gene coevolution as a signal of physical interactions among genes, or nonstandard events such as horizontal gene transfer. The goal of this thesis is to develop and implement a flexible set of algorithms and methodologies that can be used for the alignment of trees and/or networks having various sizes and properties. We first define a new relaxed model of graph isomorphism in which the shortest path lengths are preserved between corresponding intra-node pairs. Then, based on Google's PageRank model, we present a new tree matching approach, phyloAligner, which resolves several weakness of previous approaches. We further generalize this tree matching algorithm to a broader flexible framework, MCS-Finder, as a scalable and error-tolerant approximation for identifying the maximum common substructure between weighted graphs or distance matrices of different sizes. For phylogenetic trees with weighted edges and strictly-labeled nodes, multidimensional scaling-based methods, xCEED, can effectively evaluate the structural similarity and identify which regions are congruent/incongruent. These methods successfully detected coevolutionary signals as well as nonstandard evolutionary events such as horizontal gene transfer, and recovered interaction specificity between multigene families

    Riemann-Roch theory for sublattices of the root lattice An, graph automorphisms and counting cycles in graphs

    Get PDF
    This thesis consists of two independent parts. In the rst part of the thesis, we develop a Riemann-Roch theory for sublattices of the root lattice An extending the work of Baker and Norine (Advances in Mathematics, 215(2): 766-788, 2007) and study questions that arise from this theory. Our theory is based on the study of critical points of a certain simplicial distance function on a lattice and establishes connections between the Riemann-Roch theory and the Voronoi diagrams of lattices under certain simplicial distance functions. In particular, we provide a new geometric approach for the study of the Laplacian of graphs. As a consequence, we obtain a geometric proof of the Riemann-Roch theorem for graphs and generalise the result to other sub-lattices of An. Furthermore, we use the geometric approach to study the problem of computing the rank of a divisor on a nite multigraph G to obtain an algorithm that runs in polynomial time for a xed number of vertices, in particular with running time 2O(n log n)poly(size(G)) where n is the number of vertices of G. Motivated by this theory, we study a dimensionality reduction approach to the graph automorphism problem and we also obtain an algorithm for the related problem of counting automorphisms of graphs that is based on exponential sums. In the second part of the thesis, we develop an approach, based on complex-valued hash functions, to count cycles in graphs in the data streaming model. Our algorithm is based on the idea of computing instances of complex-valued random variables over the given stream and improves drastically upon the naive sampling algorithm.Diese Dissertation besteht aus zwei unabhaengigen Teilen. Im ersten Teil entwickeln wir auf der Arbeit von Baker und Norine (Advances in Mathematics, 215(2): 766-788, 2007) aufbauend eine Riemann-Roch Theorie fuer Untergitter (sublattices) des Wurzelgitter (root lattice) An und untersuchen die Fragestellungen, die sich daraus ergeben. Unsere Theorie basiert auf der Untersuchung kritischer Punkte einer bestimmten simplizialen (simplicial) Metrik (distance function) auf Gitter und zeigt Verbindungen zwischen der Riemann-Roch Theorie und Voronoi-Diagrammen von Gittern unter einer gewissen simplizialen Metrik. Insbesondere liefern wir einen neuen geometrischen Beweis des Riemann-Roch Theorems fuer Graphen und generalisieren das Resultat fuer andere Untergitter von An. Des Weiteren verwenden wir den geometrischen Ansatz um das Problem der Berechnung des Rang (rank) eines Teilers (divisor) auf einem endlichen Multigraphen G und erhalten einen Algorithmus, der fuer eine xe Anzahl von Knoten in Polynomialzeit, genauer in Zeit 2O(n log n)poly(size(G)) mit n ist die Anzahl der Knoten in G, laeuft. Von dieser Theorie ausgehend untersuchen wir einen Anzatz fuer das Graphautomorphismusproblem ueber eine Dimensionalitaetsreduktion und erhalten ebenfalls einen Algorithmus fuer das verwandte Problem des Zaehlens von Automorphismen eines Graphen, der auf exponentiellen Summen basiert. Im zweiten Teil der Dissertation entwickeln wir einen auf komplexwertigen Hashfunktionen basierenden Ansatz um in einem Streaming-Modell die Zyklen eines Graphen zu zaehlen. Unser Algorithmus basiert auf der Idee Instanzen von komplexwertigen Zufallsvariablen ueber dem gegebenen Stream zu berechnen und stellt eine drastische Verbesserung ueber den naiven Sampling-Algorithmus dar. Im zweiten Teil der Dissertation entwickeln wir einen auf komplexwertigen Hashfunktionen basierenden Ansatz um in einem Streaming-Modell die Zyklen eines Graphen zu zaehlen. Unser Algorithmus basiert auf der Idee Instanzen von komplexwertigen Zufallsvariablen ueber dem gegebenen Stream zu berechnen und stellt eine drastische Verbesserung ueber den naiven Sampling-Algorithmus dar

    Exact algorithms for pairwise protein structure alignment

    Get PDF
    Klau, G.W. [Promotor

    New Approaches to Protein Structure Prediction

    Get PDF
    Protein structure prediction is concerned with the prediction of a protein's three dimensional structure from its amino acid sequence. Such predictions are commonly performed by searching the possible structures and evaluating each structure by using some scoring function. If it is assumed that the target protein structure resembles the structure of a known protein, the search space can be significantly reduced. Such an approach is referred to as comparative structure prediction. When such an assumption is not made, the approach is known as ab initio structure prediction. There are several difficulties in devising efficient searches or in computing the scoring function. Many of these problems have ready solutions from known mathematical methods. However, the problems that are yet unsolved have hindered structure prediction methods from more ideal predictions. The objective of this study is to present a complete framework for ab initio protein structure prediction. To achieve this, a new search strategy is proposed, and better techniques are devised for computing the known scoring functions. Some of the remaining problems in protein structure prediction are revisited. Several of them are shown to be intractable. In many of these cases, approximation methods are suggested as alternative solutions. The primary issues addressed in this thesis are concerned with local structures prediction, structure assembly or sampling, side chain packing, model comparison, and structural alignment. For brevity, we do not elaborate on these problems here; a concise introduction is given in the first section of this thesis. Results from these studies prompted the development of several programs, forming a utility suite for ab initio protein structure prediction. Due to the general usefulness of these programs, some of them are released with open source licenses to benefit the community
    corecore