307 research outputs found
A New Framework for Join Product Skew
Different types of data skew can result in load imbalance in the context of
parallel joins under the shared nothing architecture. We study one important
type of skew, join product skew (JPS). A static approach based on frequency
classes is proposed which takes for granted the data distribution of join
attribute values. It comes from the observation that the join selectivity can
be expressed as a sum of products of frequencies of the join attribute values.
As a consequence, an appropriate assignment of join sub-tasks, that takes into
consideration the magnitude of the frequency products can alleviate the join
product skew. Motivated by the aforementioned remark, we propose an algorithm,
called Handling Join Product Skew (HJPS), to handle join product skew
Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster
The availability of large number of processing nodes in a parallel and
distributed computing environment enables sophisticated real time processing
over high speed data streams, as required by many emerging applications.
Sliding window stream joins are among the most important operators in a stream
processing system. In this paper, we consider the issue of parallelizing a
sliding window stream join operator over a shared nothing cluster. We propose a
framework, based on fixed or predefined communication pattern, to distribute
the join processing loads over the shared-nothing cluster. We consider various
overheads while scaling over a large number of nodes, and propose solution
methodologies to cope with the issues. We implement the algorithm over a
cluster using a message passing system, and present the experimental results
showing the effectiveness of the join processing algorithm.Comment: 11 page
One Size Cannot Fit All: a Self-Adaptive Dispatcher for Skewed Hash Join in Shared-nothing RDBMSs
Shared-nothing architecture has been widely adopted in various commercial
distributed RDBMSs. Thanks to the architecture, query can be processed in
parallel and accelerated by scaling up the cluster horizontally on demand. In
spite of that, load balancing has been a challenging issue in all distributed
RDBMSs, including shared-nothing ones, which suffers much from skewed data
distribution. In this work, we focus on one of the representative operator,
namely Hash Join, and investigate how skewness among the nodes of a cluster
will affect the load balance and eventual efficiency of an arbitrary query in
shared-nothing RDBMSs. We found that existing Distributed Hash Join (Dist-HJ)
solutions may not provide satisfactory performance when a value is skewed in
both the probe and build tables. To address that, we propose a novel Dist-HJ
solution, namely Partition and Replication (PnR). Although PnR provide the best
efficiency in some skewness scenario, our exhaustive experiments over a group
of shared-nothing RDBMSs show that there is not a single Dist-HJ solution that
wins in all (data skew) scenarios. To this end, we further propose a
self-adaptive Dist-HJ solution with a builtin sub-operator cost model that
dynamically select the best Dist-HJ implementation strategy at runtime
according to the data skew of the target query. We implement the solution in
our commercial shared-nothing RDBMSs, namely KaiwuDB (former name ZNBase) and
empirical study justifies that the self-adaptive model achieves the best
performance comparing to a series of solution adopted in many existing RDBMSs
A scalable analysis framework for large-scale RDF data
With the growth of the Semantic Web, the availability of RDF datasets from multiple domains
as Linked Data has taken the corpora of this web to a terabyte-scale, and challenges
modern knowledge storage and discovery techniques. Research and engineering on RDF
data management systems is a very active area with many standalone systems being introduced.
However, as the size of RDF data increases, such single-machine approaches meet
performance bottlenecks, in terms of both data loading and querying, due to the limited
parallelism inherent to symmetric multi-threaded systems and the limited available system
I/O and system memory. Although several approaches for distributed RDF data processing
have been proposed, along with clustered versions of more traditional approaches, their
techniques are limited by the trade-off they exploit between loading complexity and query
efficiency in the presence of big RDF data. This thesis then, introduces a scalable analysis
framework for processing large-scale RDF data, which focuses on various techniques to
reduce inter-machine communication, computation and load-imbalancing so as to achieve
fast data loading and querying on distributed infrastructures.
The first part of this thesis focuses on the study of RDF store implementation and parallel
hashing on big data processing. (1) A system-level investigation of RDF store implementation
has been conducted on the basis of a comparative analysis of runtime characteristics
of a representative set of RDF stores. The detailed time cost and system consumption is
measured for data loading and querying so as to provide insight into different triple store
implementation as well as an understanding of performance differences between different
platforms. (2) A high-level structured parallel hashing approach over distributed memory is
proposed and theoretically analyzed. The detailed performance of hashing implementations
using different lock-free strategies has been characterized through extensive experiments,
thereby allowing system developers to make a more informed choice for the implementation
of their high-performance analytical data processing systems.
The second part of this thesis proposes three main techniques for fast processing of large
RDF data within the proposed framework. (1) A very efficient parallel dictionary encoding
algorithm, to avoid unnecessary disk-space consumption and reduce computational complexity of query execution. The presented implementation has achieved notable speedups
compared to the state-of-art method and also has achieved excellent scalability. (2) Several
novel parallel join algorithms, to efficiently handle skew over large data during query processing.
The approaches have achieved good load balancing and have been demonstrated
to be faster than the state-of-art techniques in both theoretical and experimental comparisons.
(3) A two-tier dynamic indexing approach for processing SPARQL queries has been
devised which keeps loading times low and decreases or in some instances removes intermachine
data movement for subsequent queries that contain the same graph patterns. The
results demonstrate that this design can load data at least an order of magnitude faster than
a clustered store operating in RAM while remaining within an interactive range for query
processing and even outperforms current systems for various queries
An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System
The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query
The End of Slow Networks: It's Time for a Redesign
Next generation high-performance RDMA-capable networks will require a
fundamental rethinking of the design and architecture of modern distributed
DBMSs. These systems are commonly designed and optimized under the assumption
that the network is the bottleneck: the network is slow and "thin", and thus
needs to be avoided as much as possible. Yet this assumption no longer holds
true. With InfiniBand FDR 4x, the bandwidth available to transfer data across
network is in the same ballpark as the bandwidth of one memory channel, and it
increases even further with the most recent EDR standard. Moreover, with the
increasing advances of RDMA, the latency improves similarly fast. In this
paper, we first argue that the "old" distributed database design is not capable
of taking full advantage of the network. Second, we propose architectural
redesigns for OLTP, OLAP and advanced analytical frameworks to take better
advantage of the improved bandwidth, latency and RDMA capabilities. Finally,
for each of the workload categories, we show that remarkable performance
improvements can be achieved
Skew-Insensitive Join Processing in Shared-Disk Database Systems
Skew effects are still a significant problem for efficient query processing in parallel database systems. Especially in shared-nothing environments, this problem is aggravated by the substantial cost of data redistribution. Shared-disk systems, on the other hand, promise much higher flexibility in the distribution of workload among processing nodes because all input data can be accessed by any node at equal cost. In order to verify this potential for dynamic load balancing, we have devised a new technique for skew-tolerant join processing. In contrast to conventional solutions, our algorithm is not restricted to estimating processing costs in advance and assigning tasks to nodes accordingly. Instead, it monitors the actual progression of work and dynamically allocates tasks to processors, thus capitalizing on the uniform access pathlength in shared-disk architectures. This approach has the potential to alleviate not only any kind of data-inherent skew, but also execution skew caused by query- external workloads, by disk contention, or simply by inaccurate estimates used in predictive scheduling. We employ a detailed simulation system to evaluate the new algorithm under different types and degrees of skew
Skew-tolerantes, dynamisches LPT-Scheduling zur Join-Verarbeitung in parallelen Shared-Disk-Datenbanksystemen
In parallelen Datenbanken, die für Decision-Support-Aufgaben wie z. B. Data Warehousing eingesetzt werden, spielen hohe Durchsatzraten, kurze Antwortzeiten und damit auch Lastbalancierungsfragen eine entscheidende Rolle. Dies gilt insbesondere für komplexe Operationen wie den relationalen Join. Das größte Problem bei seiner parallelen Ausführung sind nichtuniforme Daten- und Werteverteilungen (Skew), die nur begrenzt vorhersehbar sind und somit zur
Laufzeit behandelt werden müssen. Dies ist in den verbreiteten Shared-Nothing-Rechnerarchitekturen jedoch nur schwer zu realisieren, da Datenumverteilungen mit hohem Zusatzaufwand verbunden sind. Wir schlagen daher ein dynamisches Lastbalancierungsverfahren auf Basis einer Shared-Disk-Architektur vor, welches aufgrund der uniformen Zugriffsstruktur weitaus effizienter arbeitet, als dies in Shared-Nothing-Systemen möglich ist. In einer Simulationsstudiezeigt es sich einem herkömmlichen prädiktiven Algorithmus deutlich überlegen
맵리듀스에서의 병렬 조인을 위한 다차원 범위 분할 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 이상구.Joins are fundamental operations for many data analysis tasks, but are not directly supported by the MapReduce framework. This is because 1) the framework is basically designed to process a single input data set, and 2) MapReduce's key-equality based data grouping method makes it difficult to support complex join conditions. As a result, a large number of MapReduce-based join algorithms have been proposed.
As in traditional shared-nothing systems, one of the major issues in join algorithms using MapReduce is handling of data skew. We propose a new skew handling method, called Multi-Dimensional Range Partitioning (MDRP), and show that the proposed method outperforms traditional skew handling methods: range-based and randomized methods. Specifically, the proposed method has the following advantages: 1) Compared to the range-based method, it considers the number of output tuples at each machine, which leads better handling of join product skew. 2) Compared with the randomized method, it exploits given join conditions before the actual join begins, so that unnecessary input duplication can be reduced.
The MDRP method can be used to support advanced join operations such as theta-joins and multi-way joins. With extensive experiments using real and synthetic data sets, we evaluate the effectiveness of the proposed algorithm.Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
II. Backgrounds and RelatedWork . . . . . . . . . . . . . . . . 8
2.1 MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Join Algorithms in MapReduce . . . . . . . . . . . . . . . . 11
2.2.1 Two-Way Join Algorithms . . . . . . . . . . . . . . 11
2.2.2 Multi-Way Join Algorithms . . . . . . . . . . . . . 17
2.3 Data Skew in Join Algorithms . . . . . . . . . . . . . . . . 18
2.4 Skew Handling Approaches in MapReduce . . . . . . . . . 22
2.4.1 Hash-Based Approach . . . . . . . . . . . . . . . . 22
2.4.2 Range-Based Approach . . . . . . . . . . . . . . . 24
2.4.3 Randomized Approach . . . . . . . . . . . . . . . . 26
III. Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Multi-Dimensional Range Partitioning . . . . . . . . . . . . 29
3.1.1 Creation of a Partitioning Matrix . . . . . . . . . . . 29
3.1.2 Identifying and Chopping of Heavy Cells . . . . . . 31
3.1.3 Assigning Cells to Reducers . . . . . . . . . . . . . 33
3.1.4 Join Processing using the Partitioning Matrix . . . . 35
3.2 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . 39
3.3 Complex Join Conditions . . . . . . . . . . . . . . . . . . . 41
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.1 Scalar Skew Experiments . . . . . . . . . . . . . . . 44
3.4.2 Zipfs Distribution . . . . . . . . . . . . . . . . . . 49
3.4.3 Non-Equijoin Experiments . . . . . . . . . . . . . . 50
3.4.4 Scalability Experiments . . . . . . . . . . . . . . . 52
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.2 Memory-Awareness . . . . . . . . . . . . . . . . . 58
3.5.3 Handling of Heavy Cells . . . . . . . . . . . . . . . 59
3.5.4 Existing Histograms . . . . . . . . . . . . . . . . . 60
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
IV. Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1 Joining Multiple Relations in a MapReduce Job . . . . . . . 65
4.1.1 Example: SPARQL Basic Graph Pattern . . . . . . . 65
4.1.2 Example: Matrix Chain Multiplication . . . . . . . . 67
4.1.3 Single-Key Join and Multiple-Key Join Queries . . . 69
4.2 Skew Handling for Multi-Way Joins . . . . . . . . . . . . . 71
4.2.1 Skew Handling for SK-Join Queries . . . . . . . . . 71
4.2.2 Skew Handling for MK-Join Queires . . . . . . . . 72
4.3 Combinations of SK-Join and MK-Join . . . . . . . . . . . 74
4.3.1 Complex Queries . . . . . . . . . . . . . . . . . . . 74
4.3.2 Iteration-Based Algorithms . . . . . . . . . . . . . . 75
4.3.3 Replication-Based Algorithms . . . . . . . . . . . . 77
4.3.4 Iteration-Based vs. Replication-Based . . . . . . . . 78
4.4 Join-Key Selection Algorithms for Complex Queries . . . . 83
4.4.1 Greedy Key Selection . . . . . . . . . . . . . . . . 84
4.4.2 Multiple Key Selection . . . . . . . . . . . . . . . . 85
4.4.3 Hybrid Key Selection . . . . . . . . . . . . . . . . . 86
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.1 SK-Join Experiments . . . . . . . . . . . . . . . . . 87
4.5.2 MK-Join Experiments . . . . . . . . . . . . . . . . 89
4.5.3 Analysis of TV Watching Logs . . . . . . . . . . . . 90
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
V. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1 Algorithms for SPARQL Basic Graph Pattern . . . . . . . . 94
5.1.1 MR-Selection . . . . . . . . . . . . . . . . . . . . . 95
5.1.2 MR-Join . . . . . . . . . . . . . . . . . . . . . . . 98
5.1.3 Performance Evaluation . . . . . . . . . . . . . . . 101
5.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Algorithms for Matrix Chain Multiplication . . . . . . . . . 107
5.2.1 Serial Two-Way Join (S2) . . . . . . . . . . . . . . 109
5.2.2 Parallel M-Way Join (P2, PM) . . . . . . . . . . . . 111
5.2.3 Serial Two-Way vs. Parallel M-Way . . . . . . . . . 115
5.2.4 Performance Evaluation . . . . . . . . . . . . . . . 116
5.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . 119
5.2.6 Extension: Embedded MapReduce . . . . . . . . . . 119
VI. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
초록 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Docto
- …