7,630 research outputs found

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Data fragmentation for parallel transitive closure strategies

    Get PDF
    Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie

    Algebraic optimization of recursive queries

    Get PDF
    Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud \ud In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud \ud The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems

    Practical Reasoning for Very Expressive Description Logics

    Full text link
    Description Logics (DLs) are a family of knowledge representation formalisms mainly characterised by constructors to build complex concepts and roles from atomic ones. Expressive role constructors are important in many applications, but can be computationally problematical. We present an algorithm that decides satisfiability of the DL ALC extended with transitive and inverse roles and functional restrictions with respect to general concept inclusion axioms and role hierarchies; early experiments indicate that this algorithm is well-suited for implementation. Additionally, we show that ALC extended with just transitive and inverse roles is still in PSPACE. We investigate the limits of decidability for this family of DLs, showing that relaxing the constraints placed on the kinds of roles used in number restrictions leads to the undecidability of all inference problems. Finally, we describe a number of optimisation techniques that are crucial in obtaining implementations of the decision procedures, which, despite the worst-case complexity of the problem, exhibit good performance with real-life problems

    High-Performance Reachability Query Processing under Index Size Restrictions

    Full text link
    In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer - even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.Comment: 30 page

    Approximating the Minimum Equivalent Digraph

    Full text link
    The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives an approximation algorithm with performance guarantee of pi^2/6 ~ 1.64. The algorithm and its analysis are based on the simple idea of contracting long cycles. (This result is strengthened slightly in ``On strongly connected digraphs with bounded cycle length'' (1996).) The analysis applies directly to 2-Exchange, a simple ``local improvement'' algorithm, showing that its performance guarantee is 1.75.Comment: conference version in ACM-SIAM Symposium on Discrete Algorithms (1994

    Fully Dynamic Single-Source Reachability in Practice: An Experimental Study

    Full text link
    Given a directed graph and a source vertex, the fully dynamic single-source reachability problem is to maintain the set of vertices that are reachable from the given vertex, subject to edge deletions and insertions. It is one of the most fundamental problems on graphs and appears directly or indirectly in many and varied applications. While there has been theoretical work on this problem, showing both linear conditional lower bounds for the fully dynamic problem and insertions-only and deletions-only upper bounds beating these conditional lower bounds, there has been no experimental study that compares the performance of fully dynamic reachability algorithms in practice. Previous experimental studies in this area concentrated only on the more general all-pairs reachability or transitive closure problem and did not use real-world dynamic graphs. In this paper, we bridge this gap by empirically studying an extensive set of algorithms for the single-source reachability problem in the fully dynamic setting. In particular, we design several fully dynamic variants of well-known approaches to obtain and maintain reachability information with respect to a distinguished source. Moreover, we extend the existing insertions-only or deletions-only upper bounds into fully dynamic algorithms. Even though the worst-case time per operation of all the fully dynamic algorithms we evaluate is at least linear in the number of edges in the graph (as is to be expected given the conditional lower bounds) we show in our extensive experimental evaluation that their performance differs greatly, both on generated as well as on real-world instances
    corecore