20,119 research outputs found

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Data fragmentation for parallel transitive closure strategies

    Get PDF
    Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie

    Logic Programming Applications: What Are the Abstractions and Implementations?

    Full text link
    This article presents an overview of applications of logic programming, classifying them based on the abstractions and implementations of logic languages that support the applications. The three key abstractions are join, recursion, and constraint. Their essential implementations are for-loops, fixed points, and backtracking, respectively. The corresponding kinds of applications are database queries, inductive analysis, and combinatorial search, respectively. We also discuss language extensions and programming paradigms, summarize example application problems by application areas, and touch on example systems that support variants of the abstractions with different implementations

    On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

    Full text link
    Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure
    • 

    corecore