11 research outputs found

    Cost-based optimization of graph queries in relational database management systems

    Get PDF
    Graphen sind in vielen Bereichen des Lebens zu finden, wobei wir speziell an Graphen in der Biologie interessiert sind. Knoten in solchen Graphen sind chemische Komponenten, Enzyme, Reaktionen oder Interaktionen, die durch Kanten miteinander verbunden sind. Eine effiziente AusfĂŒhrung von Graphanfragen ist eine Herausforderung. In dieser Arbeit prĂ€sentieren wir GRIcano, ein System, das die effiziente AusfĂŒhrung von Graphanfragen erlaubt. Wir nehmen an, dass Graphen in relationalen Datenbankmanagementsystemen (RDBMS) gespeichert sind. Als Graphanfragesprache schlagen wir eine erweiterte Version der Pathway Query Language (PQL) vor. Der Hauptbestandteil von GRIcano ist ein kostenbasierter Anfrageoptimierer. Diese Arbeit enthĂ€lt BeitrĂ€ge zu allen drei benötigten Komponenten des Optimierers, der relationalen Algebra, Implementierungen und Kostenmodellen. Die Operatoren der relationalen Algebra sind nicht ausreichend, um Graphanfragen auszudrĂŒcken. Daher stellen wir zuerst neue Operatoren vor. Wir schlagen den Erreichbarkeits-, Distanz-, PfadlĂ€ngen- und Pfadoperator vor. ZusĂ€tzlich geben wir Regeln fĂŒr die Umformung von AusdrĂŒcken an. Des Weiteren prĂ€sentieren wir Implementierungen fĂŒr jeden vorgeschlagenen Operator. Der Hauptbeitrag ist GRIPP, eine Indexstruktur, die die effiziente AusfĂŒhrung von Erreichbarkeitsanfragen auf sehr großen Graphen erlaubt. Wir zeigen, wie GRIPP und die rekursive Anfragestrategie genutzt werden können, um Implementierungen fĂŒr alle Operatoren bereitzustellen. Die dritte Komponente von GRIcano ist das Kostenmodell, das KardinalitĂ€tsabschĂ€tzungen der Operatoren und Kostenfunktionen fĂŒr die Implementierungen benötigt. Basierend auf umfangreichen Experimenten schlagen wir in dieser Arbeit Funktionen dafĂŒr vor. Der neue Ansatz unserer Kostenmodelle ist, dass die Funktionen nur Kennzahlen der Graphen verwenden. Abschließend zeigen wir die Wirkungsweise von GRIcano durch Beispielanfragen auf echten biologischen Graphen.Graphs occur in many areas of life. We are interested in graphs in biology, where nodes are chemical compounds, enzymes, reactions, or interactions that are connected by edges. Efficiently querying these graphs is a challenging task. In this thesis we present GRIcano, a system that efficiently executes graph queries. For GRIcano we assume that graphs are stored and queried using relational database management systems (RDBMS). We propose an extended version of the Pathway Query Language PQL to express graph queries. The core of GRIcano is a cost-based query optimizer. This thesis makes contributions to all three required components of the optimizer, the relational algebra, implementations, and cost model. Relational algebra operators alone are not sufficient to express graph queries. Thus, we first present new operators to rewrite PQL queries to algebra expressions. We propose the reachability, distance, path length, and path operator. In addition, we provide rewrite rules for the newly proposed operators in combination with standard relational algebra operators. Secondly, we present implementations for each proposed operator. The main contribution is GRIPP, an index structure that allows us to answer reachability queries on very large graphs. GRIPP has advantages over other existing index structures, which we review in this work. In addition, we show how to employ GRIPP and the recursive query strategy as implementation for all four proposed operators. The third component of GRIcano is the cost model, which requires cardinality estimates for operators and cost functions for implementations. Based on extensive experimental evaluation of our proposed algorithms we present functions to estimate the cardinality of operators and the cost of executing a query. The novelty of our approach is that these functions only use key figures of the graph. We finally present the effectiveness of GRIcano using exemplary graph queries on real biological networks

    A structural keystone for drug design

    No full text
    3D-structures of proteins and potential ligands are the cornerstones of rational drug design. The first brick to build upon is selecting a protein target and finding out whether biologically active compounds are known. Both tasks require more information than the structures themselves provide. For this purpose we have built a web resource bridging protein and ligand databases. It consists of three parts: i) A data warehouse on annotation of protein structures that integrates many well-known databases such as Swiss-Prot, SCOP, ENZYME and others. ii) A conformational library of structures of approved drugs. iii) A conformational library of ligands from the PDB, linking the realms of proteins and small molecules. The data collection contains structures of 30,000 proteins, 5,000 different ligands from 70,000 ligand-protein complexes, and 2,500 known drugs. Sets of protein structures can be refined by criteria like protein fold, family, metabolic pathway, resolution and textual annotation. The structures of organic compounds (drugs and ligands) can be searched considering chemical formula, trivial and trade names as well as medical classification codes for drugs (ATC). Retrieving structures by 2D-similarity has been implemented for all small molecules using Tanimoto coefficients. For the drug structures, 110,000 structural conformers have been calculated to account for structural flexibility. Two substances can be compared online by 3D-superimposition, where the pair of conformers that fits best is detected. Together, these web-accessible resources can be used to identify promising drug candidates. They have been used in-house to find alternatives to substances with a known binding activity but adverse side effects

    A structural keystone for drug design

    No full text
    3D-structures of proteins and potential ligands are the cornerstones of rational drug design. The first brick to build upon is selecting a protein target and finding out whether biologically active compounds are known. Both tasks require more information than the structures themselves provide. For this purpose we have built a web resource bridging protein and ligand databases. It consists of three parts: i) A data warehouse on annotation of protein structures that integrates many well-known databases such as Swiss-Prot, SCOP, ENZYME and others. ii) A conformational library of structures of approved drugs. iii) A conformational library of ligands from the PDB, linking the realms of proteins and small molecules
    corecore