307 research outputs found

    A New Framework for Join Product Skew

    Full text link
    Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks, that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew

    Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

    Full text link
    The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

    One Size Cannot Fit All: a Self-Adaptive Dispatcher for Skewed Hash Join in Shared-nothing RDBMSs

    Full text link
    Shared-nothing architecture has been widely adopted in various commercial distributed RDBMSs. Thanks to the architecture, query can be processed in parallel and accelerated by scaling up the cluster horizontally on demand. In spite of that, load balancing has been a challenging issue in all distributed RDBMSs, including shared-nothing ones, which suffers much from skewed data distribution. In this work, we focus on one of the representative operator, namely Hash Join, and investigate how skewness among the nodes of a cluster will affect the load balance and eventual efficiency of an arbitrary query in shared-nothing RDBMSs. We found that existing Distributed Hash Join (Dist-HJ) solutions may not provide satisfactory performance when a value is skewed in both the probe and build tables. To address that, we propose a novel Dist-HJ solution, namely Partition and Replication (PnR). Although PnR provide the best efficiency in some skewness scenario, our exhaustive experiments over a group of shared-nothing RDBMSs show that there is not a single Dist-HJ solution that wins in all (data skew) scenarios. To this end, we further propose a self-adaptive Dist-HJ solution with a builtin sub-operator cost model that dynamically select the best Dist-HJ implementation strategy at runtime according to the data skew of the target query. We implement the solution in our commercial shared-nothing RDBMSs, namely KaiwuDB (former name ZNBase) and empirical study justifies that the self-adaptive model achieves the best performance comparing to a series of solution adopted in many existing RDBMSs

    A scalable analysis framework for large-scale RDF data

    Get PDF
    With the growth of the Semantic Web, the availability of RDF datasets from multiple domains as Linked Data has taken the corpora of this web to a terabyte-scale, and challenges modern knowledge storage and discovery techniques. Research and engineering on RDF data management systems is a very active area with many standalone systems being introduced. However, as the size of RDF data increases, such single-machine approaches meet performance bottlenecks, in terms of both data loading and querying, due to the limited parallelism inherent to symmetric multi-threaded systems and the limited available system I/O and system memory. Although several approaches for distributed RDF data processing have been proposed, along with clustered versions of more traditional approaches, their techniques are limited by the trade-off they exploit between loading complexity and query efficiency in the presence of big RDF data. This thesis then, introduces a scalable analysis framework for processing large-scale RDF data, which focuses on various techniques to reduce inter-machine communication, computation and load-imbalancing so as to achieve fast data loading and querying on distributed infrastructures. The first part of this thesis focuses on the study of RDF store implementation and parallel hashing on big data processing. (1) A system-level investigation of RDF store implementation has been conducted on the basis of a comparative analysis of runtime characteristics of a representative set of RDF stores. The detailed time cost and system consumption is measured for data loading and querying so as to provide insight into different triple store implementation as well as an understanding of performance differences between different platforms. (2) A high-level structured parallel hashing approach over distributed memory is proposed and theoretically analyzed. The detailed performance of hashing implementations using different lock-free strategies has been characterized through extensive experiments, thereby allowing system developers to make a more informed choice for the implementation of their high-performance analytical data processing systems. The second part of this thesis proposes three main techniques for fast processing of large RDF data within the proposed framework. (1) A very efficient parallel dictionary encoding algorithm, to avoid unnecessary disk-space consumption and reduce computational complexity of query execution. The presented implementation has achieved notable speedups compared to the state-of-art method and also has achieved excellent scalability. (2) Several novel parallel join algorithms, to efficiently handle skew over large data during query processing. The approaches have achieved good load balancing and have been demonstrated to be faster than the state-of-art techniques in both theoretical and experimental comparisons. (3) A two-tier dynamic indexing approach for processing SPARQL queries has been devised which keeps loading times low and decreases or in some instances removes intermachine data movement for subsequent queries that contain the same graph patterns. The results demonstrate that this design can load data at least an order of magnitude faster than a clustered store operating in RAM while remaining within an interactive range for query processing and even outperforms current systems for various queries

    An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System

    Get PDF
    The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query

    The End of Slow Networks: It's Time for a Redesign

    Full text link
    Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

    Skew-Insensitive Join Processing in Shared-Disk Database Systems

    Get PDF
    Skew effects are still a significant problem for efficient query processing in parallel database systems. Especially in shared-nothing environments, this problem is aggravated by the substantial cost of data redistribution. Shared-disk systems, on the other hand, promise much higher flexibility in the distribution of workload among processing nodes because all input data can be accessed by any node at equal cost. In order to verify this potential for dynamic load balancing, we have devised a new technique for skew-tolerant join processing. In contrast to conventional solutions, our algorithm is not restricted to estimating processing costs in advance and assigning tasks to nodes accordingly. Instead, it monitors the actual progression of work and dynamically allocates tasks to processors, thus capitalizing on the uniform access pathlength in shared-disk architectures. This approach has the potential to alleviate not only any kind of data-inherent skew, but also execution skew caused by query- external workloads, by disk contention, or simply by inaccurate estimates used in predictive scheduling. We employ a detailed simulation system to evaluate the new algorithm under different types and degrees of skew

    Skew-tolerantes, dynamisches LPT-Scheduling zur Join-Verarbeitung in parallelen Shared-Disk-Datenbanksystemen

    Get PDF
    In parallelen Datenbanken, die für Decision-Support-Aufgaben wie z. B. Data Warehousing eingesetzt werden, spielen hohe Durchsatzraten, kurze Antwortzeiten und damit auch Lastbalancierungsfragen eine entscheidende Rolle. Dies gilt insbesondere für komplexe Operationen wie den relationalen Join. Das größte Problem bei seiner parallelen Ausführung sind nichtuniforme Daten- und Werteverteilungen (Skew), die nur begrenzt vorhersehbar sind und somit zur Laufzeit behandelt werden müssen. Dies ist in den verbreiteten Shared-Nothing-Rechnerarchitekturen jedoch nur schwer zu realisieren, da Datenumverteilungen mit hohem Zusatzaufwand verbunden sind. Wir schlagen daher ein dynamisches Lastbalancierungsverfahren auf Basis einer Shared-Disk-Architektur vor, welches aufgrund der uniformen Zugriffsstruktur weitaus effizienter arbeitet, als dies in Shared-Nothing-Systemen möglich ist. In einer Simulationsstudiezeigt es sich einem herkömmlichen prädiktiven Algorithmus deutlich überlegen

    맵리듀스에서의 병렬 조인을 위한 다차원 범위 분할 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 이상구.Joins are fundamental operations for many data analysis tasks, but are not directly supported by the MapReduce framework. This is because 1) the framework is basically designed to process a single input data set, and 2) MapReduce's key-equality based data grouping method makes it difficult to support complex join conditions. As a result, a large number of MapReduce-based join algorithms have been proposed. As in traditional shared-nothing systems, one of the major issues in join algorithms using MapReduce is handling of data skew. We propose a new skew handling method, called Multi-Dimensional Range Partitioning (MDRP), and show that the proposed method outperforms traditional skew handling methods: range-based and randomized methods. Specifically, the proposed method has the following advantages: 1) Compared to the range-based method, it considers the number of output tuples at each machine, which leads better handling of join product skew. 2) Compared with the randomized method, it exploits given join conditions before the actual join begins, so that unnecessary input duplication can be reduced. The MDRP method can be used to support advanced join operations such as theta-joins and multi-way joins. With extensive experiments using real and synthetic data sets, we evaluate the effectiveness of the proposed algorithm.Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 II. Backgrounds and RelatedWork . . . . . . . . . . . . . . . . 8 2.1 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Join Algorithms in MapReduce . . . . . . . . . . . . . . . . 11 2.2.1 Two-Way Join Algorithms . . . . . . . . . . . . . . 11 2.2.2 Multi-Way Join Algorithms . . . . . . . . . . . . . 17 2.3 Data Skew in Join Algorithms . . . . . . . . . . . . . . . . 18 2.4 Skew Handling Approaches in MapReduce . . . . . . . . . 22 2.4.1 Hash-Based Approach . . . . . . . . . . . . . . . . 22 2.4.2 Range-Based Approach . . . . . . . . . . . . . . . 24 2.4.3 Randomized Approach . . . . . . . . . . . . . . . . 26 III. Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Multi-Dimensional Range Partitioning . . . . . . . . . . . . 29 3.1.1 Creation of a Partitioning Matrix . . . . . . . . . . . 29 3.1.2 Identifying and Chopping of Heavy Cells . . . . . . 31 3.1.3 Assigning Cells to Reducers . . . . . . . . . . . . . 33 3.1.4 Join Processing using the Partitioning Matrix . . . . 35 3.2 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . 39 3.3 Complex Join Conditions . . . . . . . . . . . . . . . . . . . 41 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.1 Scalar Skew Experiments . . . . . . . . . . . . . . . 44 3.4.2 Zipfs Distribution . . . . . . . . . . . . . . . . . . 49 3.4.3 Non-Equijoin Experiments . . . . . . . . . . . . . . 50 3.4.4 Scalability Experiments . . . . . . . . . . . . . . . 52 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.2 Memory-Awareness . . . . . . . . . . . . . . . . . 58 3.5.3 Handling of Heavy Cells . . . . . . . . . . . . . . . 59 3.5.4 Existing Histograms . . . . . . . . . . . . . . . . . 60 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 IV. Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.1 Joining Multiple Relations in a MapReduce Job . . . . . . . 65 4.1.1 Example: SPARQL Basic Graph Pattern . . . . . . . 65 4.1.2 Example: Matrix Chain Multiplication . . . . . . . . 67 4.1.3 Single-Key Join and Multiple-Key Join Queries . . . 69 4.2 Skew Handling for Multi-Way Joins . . . . . . . . . . . . . 71 4.2.1 Skew Handling for SK-Join Queries . . . . . . . . . 71 4.2.2 Skew Handling for MK-Join Queires . . . . . . . . 72 4.3 Combinations of SK-Join and MK-Join . . . . . . . . . . . 74 4.3.1 Complex Queries . . . . . . . . . . . . . . . . . . . 74 4.3.2 Iteration-Based Algorithms . . . . . . . . . . . . . . 75 4.3.3 Replication-Based Algorithms . . . . . . . . . . . . 77 4.3.4 Iteration-Based vs. Replication-Based . . . . . . . . 78 4.4 Join-Key Selection Algorithms for Complex Queries . . . . 83 4.4.1 Greedy Key Selection . . . . . . . . . . . . . . . . 84 4.4.2 Multiple Key Selection . . . . . . . . . . . . . . . . 85 4.4.3 Hybrid Key Selection . . . . . . . . . . . . . . . . . 86 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.5.1 SK-Join Experiments . . . . . . . . . . . . . . . . . 87 4.5.2 MK-Join Experiments . . . . . . . . . . . . . . . . 89 4.5.3 Analysis of TV Watching Logs . . . . . . . . . . . . 90 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 V. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.1 Algorithms for SPARQL Basic Graph Pattern . . . . . . . . 94 5.1.1 MR-Selection . . . . . . . . . . . . . . . . . . . . . 95 5.1.2 MR-Join . . . . . . . . . . . . . . . . . . . . . . . 98 5.1.3 Performance Evaluation . . . . . . . . . . . . . . . 101 5.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . 105 5.2 Algorithms for Matrix Chain Multiplication . . . . . . . . . 107 5.2.1 Serial Two-Way Join (S2) . . . . . . . . . . . . . . 109 5.2.2 Parallel M-Way Join (P2, PM) . . . . . . . . . . . . 111 5.2.3 Serial Two-Way vs. Parallel M-Way . . . . . . . . . 115 5.2.4 Performance Evaluation . . . . . . . . . . . . . . . 116 5.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . 119 5.2.6 Extension: Embedded MapReduce . . . . . . . . . . 119 VI. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 초록 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Docto
    corecore