Search CORE

6 research outputs found

Load Balancing and Skew Resilience for Parallel Joins

Author: ElSeidy Mohammed
Koch Christoph
Vitorovic Aleksandar
Publication venue
Publication date: 03/12/2014
Field of study

We address the problem of load balancing for parallel joins. We show that the distribution of input data received and the output data produced by worker machines are both important for performance. As a result, previous work, which optimizes either for input or output, stands ineffective for load balancing. To that end, we propose a multi-stage load-balancing algorithm which considers the properties of both input and output data through sampling of the original join matrix. To do this efficiently, we propose a novel category of equi-weight histograms. To build them, we exploit state-of-the-art computational geometry algorithms for rectangle tiling. To our knowledge, we are the first to employ tiling algorithms for join load-balancing. In addition, we propose a novel, join-specialized tiling algorithm that has drastically lower time and space complexity than existing algorithms. Experiments show that our scheme outperforms state-of-the-art techniques by up to a factor of 15

Infoscience - École polytechnique fédérale de Lausanne

Scalable and Adaptive Online Joins

Author: Elguindy Abdallah
ElSeidy Mohammed
Koch Christoph
Vitorovic Aleksandar
Publication venue: Hangzhou, China, VLDB
Publication date: 28/10/2013
Field of study

Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload are known, traditional static approaches perform poorly. This paper presents a novel parallel online dataflow join operator that supports arbitrary join predicates. The proposed operator continuously adjusts itself to the data dynamics through adaptive dataflow routing and state repartitioning. The operator is resilient to data skew, maintains high throughput rates, avoids blocking behavior during state repartitioning, takes an eventual consistency approach for maintaining its local state, and behaves strongly consistently as a black-box dataflow operator. We prove that the operator ensures a constant competitive ratio 3.75 in data distribution optimality and that the cost of processing an input tuple is amortized constant, taking into account adaptivity costs. Our evaluation demonstrates that our operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time

Infoscience - École polytechnique fédérale de Lausanne

Peregrine: A Pattern-Aware Graph Mining System

Author: Ahmed Nesreen K.
Bearman Peter S.
Chen Hongzhi
Daniel
Dias Vinicius
Elseidy Mohammed
Gonzalez Joseph E.
Gonzalez Joseph E.
Hall Bronwyn
Han Wook-Shin
Hoang Loc
Iyer Anand Padmanabha
Jinghan
Joshua
Julian
Kankanamge Chathura
Kim Jinha
Korshunov Anton
Lai Longbin
Malewicz Grzegorz
Mawhirter Daniel
McSherry Frank
Meysman Pieter
Mugilan
Nguyen Donald
Pradeep
Semih
Serafini Marco
Song Qi
Teixeira Carlos H. C.
Ullmann Julian
Vora Keval
Vora Keval
Vora Keval
Wang Kai
Yuyi
Zhang Gensheng
Zhu Xiaowei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2020
Field of study

Graph mining workloads aim to extract structural properties of a graph by exploring its subgraph structures. General purpose graph mining systems provide a generic runtime to explore subgraph structures of interest with the help of user-defined functions that guide the overall exploration process. However, the state-of-the-art graph mining systems remain largely oblivious to the shape (or pattern) of the subgraphs that they mine. This causes them to: (a) explore unnecessary subgraphs; (b) perform expensive computations on the explored subgraphs; and, (c) hold intermediate partial subgraphs in memory; all of which affect their overall performance. Furthermore, their programming models are often tied to their underlying exploration strategies, which makes it difficult for domain users to express complex mining tasks. In this paper, we develop Peregrine, a pattern-aware graph mining system that directly explores the subgraphs of interest while avoiding exploration of unnecessary subgraphs, and simultaneously bypassing expensive computations throughout the mining process. We design a pattern-based programming model that treats "graph patterns" as first class constructs and enables Peregrine to extract the semantics of patterns, which it uses to guide its exploration. Our evaluation shows that Peregrine outperforms state-of-the-art distributed and single machine graph mining systems, and scales to complex mining tasks on larger graphs, while retaining simplicity and expressivity with its "pattern-first" programming approach.Comment: This is the full version of the paper appearing in the European Conference on Computer Systems (EuroSys), 202

arXiv.org e-Print Archive

Crossref