784 research outputs found

    Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems

    Get PDF
    The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads

    A scalable analysis framework for large-scale RDF data

    Get PDF
    With the growth of the Semantic Web, the availability of RDF datasets from multiple domains as Linked Data has taken the corpora of this web to a terabyte-scale, and challenges modern knowledge storage and discovery techniques. Research and engineering on RDF data management systems is a very active area with many standalone systems being introduced. However, as the size of RDF data increases, such single-machine approaches meet performance bottlenecks, in terms of both data loading and querying, due to the limited parallelism inherent to symmetric multi-threaded systems and the limited available system I/O and system memory. Although several approaches for distributed RDF data processing have been proposed, along with clustered versions of more traditional approaches, their techniques are limited by the trade-off they exploit between loading complexity and query efficiency in the presence of big RDF data. This thesis then, introduces a scalable analysis framework for processing large-scale RDF data, which focuses on various techniques to reduce inter-machine communication, computation and load-imbalancing so as to achieve fast data loading and querying on distributed infrastructures. The first part of this thesis focuses on the study of RDF store implementation and parallel hashing on big data processing. (1) A system-level investigation of RDF store implementation has been conducted on the basis of a comparative analysis of runtime characteristics of a representative set of RDF stores. The detailed time cost and system consumption is measured for data loading and querying so as to provide insight into different triple store implementation as well as an understanding of performance differences between different platforms. (2) A high-level structured parallel hashing approach over distributed memory is proposed and theoretically analyzed. The detailed performance of hashing implementations using different lock-free strategies has been characterized through extensive experiments, thereby allowing system developers to make a more informed choice for the implementation of their high-performance analytical data processing systems. The second part of this thesis proposes three main techniques for fast processing of large RDF data within the proposed framework. (1) A very efficient parallel dictionary encoding algorithm, to avoid unnecessary disk-space consumption and reduce computational complexity of query execution. The presented implementation has achieved notable speedups compared to the state-of-art method and also has achieved excellent scalability. (2) Several novel parallel join algorithms, to efficiently handle skew over large data during query processing. The approaches have achieved good load balancing and have been demonstrated to be faster than the state-of-art techniques in both theoretical and experimental comparisons. (3) A two-tier dynamic indexing approach for processing SPARQL queries has been devised which keeps loading times low and decreases or in some instances removes intermachine data movement for subsequent queries that contain the same graph patterns. The results demonstrate that this design can load data at least an order of magnitude faster than a clustered store operating in RAM while remaining within an interactive range for query processing and even outperforms current systems for various queries

    OLTP on Hardware Islands

    Get PDF
    Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and to the processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, hardware heterogeneity does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. Finally, we argue in favor of shared-nothing deployments that are topology- and workload-aware and take advantage of fast on-chip communication between islands of cores on the same socket.Comment: VLDB201

    Design and evaluation of parallel hashing over large-scale data

    Get PDF
    High-performance analytical data processing systems often run on servers with large amounts of memory. A common data structure used in such environment is the hash tables. This paper focuses on investigating efficient parallel hash algorithms for processing large-scale data. Currently, hash tables on distributed architectures are accessed one key at a time by local or remote threads while shared-memory approaches focus on accessing a single table with multiple threads. A relatively straightforward “bulk-operation” approach seems to have been neglected by researchers. In this work, using such a method, we propose a high-level parallel hashing framework, Structured Parallel Hashing, targeting efficiently processing massive data on distributed memory. We present a theoretical analysis of the proposed method and describe the design of our hashing implementations. The evaluation reveals a very interesting result - the proposed straightforward method can vastly outperform distributed hashing methods and can even offer performance comparable with approaches based on shared memory supercomputers which use specialized hardware predicates. Moreover, we characterize the performance of our hash implementations through extensive experiments, thereby allowing system developers to make a more informed choice for their high-performance applications

    Load-balancing distributed outer joins through operator decomposition

    Get PDF
    High-performance data analytics largely relies on being able to efficiently execute various distributed data operators such as distributed joins. So far, large amounts of join methods have been proposed and evaluated in parallel and distributed environments. However, most of them focus on inner joins, and there is little published work providing the detailed implementations and analysis of outer joins. In this work, we present POPI (Partial Outer join & Partial Inner join), a novel method to load-balance large parallel outer joins by decomposing them into two operations: a large outer join over data that does not present significant skew in the input and an inner join over data presenting significant skew. We present the detailed implementation of our approach and show that POPI is implementable over a variety of architectures and underlying join implementations. Moreover, our experimental evaluation over a distributed memory platform also demonstrates that the proposed method is able to improve outer join performance under varying data skew and present excellent load-balancing properties, compared to current approaches

    Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

    Get PDF
    Scaling the performance of shared-everything transaction processing systems to highly parallel multicore hardware remains a challenge for database system designers. Recent proposals alleviate locking and logging bottlenecks in the system, leaving page latching as the next potential problem. To tackle the page latching problem, we propose physiological partitioning (PLP). PLP applies logical-only partitioning, maintaining the desired properties of sharedeverything designs, and introduces a multi-rooted B+Tree index structure (MRBTree) that enables the partitioning of the accesses at the physical page level. Logical partitioning and MRBTrees together ensure that all accesses to a given index page come from a single thread and, hence, can be entirely latch free; an extended design makes heap page accesses thread private as well. Moreover, MRBTrees offer an infrastructure for easy repartitioning and allow us to have a lightweight dynamic load balancing mechanism (DLB) on top of PLP. Profiling a PLP prototype running on different multicore machines shows that it acquires 85 and 68%fewer contentious critical sections, respectively, than an optimized conventional design and one based on logical-only partitioning. PLP also improves performance up to almost 50 % over the existing systems, while DLB enhances the system with rapid and robust behavior in both detecting and handling load imbalance

    10381 Summary and Abstracts Collection -- Robust Query Processing

    Get PDF
    Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants
    • 

    corecore