92 research outputs found

    Instance and Output Optimal Parallel Algorithms for Acyclic Joins

    Full text link
    Massively parallel join algorithms have received much attention in recent years, while most prior work has focused on worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on hard instances having very large output sizes, which rarely appear in practice. A stronger notion of optimality is {\em output-optimal}, which requires an algorithm to be optimal within the class of all instances sharing the same input and output size. An even stronger optimality is {\em instance-optimal}, i.e., the algorithm is optimal on every single instance, but this may not always be achievable. In the traditional RAM model of computation, the classical Yannakakis algorithm is instance-optimal on any acyclic join. But in the massively parallel computation (MPC) model, the situation becomes much more complicated. We first show that for the class of r-hierarchical joins, instance-optimality can still be achieved in the MPC model. Then, we give a new MPC algorithm for an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT} \over p}), where \IN,\OUT are the input and output sizes of the join, and pp is the number of servers in the MPC model. This improves the MPC version of the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor. Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN), for every acyclic but non-r-hierarchical join. Finally, we give the first output-sensitive lower bound for the triangle join in the MPC model, showing that it is inherently more difficult than acyclic joins

    A Simple Parallel Algorithm for Natural Joins on Binary Relations

    Get PDF

    Towards an Efficient Evaluation of General Queries

    Get PDF
    Database applications often require to evaluate queries containing quantifiers or disjunctions, e.g., for handling general integrity constraints. Existing efficient methods for processing quantifiers depart from the relational model as they rely on non-algebraic procedures. Looking at quantified query evaluation from a new angle, we propose an approach to process quantifiers that makes use of relational algebra operators only. Our approach performs in two phases. The first phase normalizes the queries producing a canonical form. This form permits to improve the translation into relational algebra performed during the second phase. The improved translation relies on a new operator - the complement-join - that generalizes the set difference, on algebraic expressions of universal quantifiers that avoid the expensive division operator in many cases, and on a special processing of disjunctions by means of constrained outer-joins. Our method achieves an efficiency at least comparable with that of previous proposals, better in most cases. Furthermore, it is considerably simpler to implement as it completely relies on relational data structures and operators

    Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems

    Get PDF
    The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads

    Dynamic Range Partitioning in Multiprocessor Database Implementations

    Get PDF
    Multiprocessor implementation of the relational database operators has recently received great attention in literature [1-4, 8, 11]. As the complexity of implementing the relational operators rests on the inter-node communication patterns involved in an operation, greater research attention has been focused on Join algorithms. The Join traffic patterns subsume those of the remaining relational operators. To effectively exploit parallelism in bucket based join implementations, the domain of the joining attributes must be partitioned into equal subranges. That is, the processing of each subrange requires roughly the same amount of time. A skewed distribution of workload significantly hinders performance. As relations exhibit a non-uniform attribute value distribution, possibly resulting from a previous operation, a priori determination of subrange boundary conditions results in a non-balanced workload across the processors. Performance degradation in parallel systems employing such static boundary subrange partitioning is demonstrated in Lakshmi and Yu [6]. That study exemplified that even a low degree of attribute skew results in a significant performance penalty. This paper proposes a statistical algorithm for dynamic determination of domain partitioning in bucket based join implementations. This statistics-based approach guarantees a near-uniform processor workload. A parameterization of the sample size versus the number of tuples is developed, and a proof of the validity of the approach is discussed. A simple illustrative example is presented

    A survey of parallel execution strategies for transitive closure and logic programs

    Get PDF
    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

    Efficient permutation-based range-join algorithms on N-dimensionalmeshes using data-shifting

    Get PDF
    ©2001 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.In this paper, we present two efficient parallel algorithms for computing a non-equijoin, range-join, of two relations an N-dimensional mesh-connected computers. The proposed algorithms uses the data-shifting approach to effectively permute every sorted subset of relation S to each processor in turn recursively in dimensions from low to high, where it is joined with the local subset of relation RShao Dong Chen, Hong Shen, Rodeny Topo

    Earlier stage for straggler detection and handling using combined CPU test and LATE methodology

    Get PDF
    Using MapReduce in Hadoop helps in lowering the execution time and power consumption for large scale data. However, there can be a delay in job processing in circumstances where tasks are assigned to bad or congested machines called "straggler tasks"; which increases the time, power consumptions and therefore increasing the costs and leading to a poor performance of computing systems. This research proposes a hybrid MapReduce framework referred to as the combinatory late-machine (CLM) framework. Implementation of this framework will facilitate early and timely detection and identification of stragglers thereby facilitating prompt appropriate and effective actions
    • …
    corecore