Search CORE

3 research outputs found

Minimizing Network Traffic for Distributed Joins Using Lightweight Locality-Aware Scheduling

Author: Cheng Long
et al.
Liu Qingzhi
Murphy John
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/07/2018
Field of study

The 24th International European Conference on Parallel and Distributed Computing (EURO-PAR 2018), Turin, Italy, 27-31 2018Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.European Commission Horizon 202

Research Repository UCD

Irish Universities

Minimizing network traffic for distributed joins using lightweight locality-aware scheduling

Author: Cheng L
Hao Chunliang
Liu Qingzhi
Murphy John
Theodoropoulos Georgios
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2018
Field of study

\u3cp\u3eLarge computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.\u3c/p\u3

Repository TU/e

Crossref

Pure OAI Repository

Minimizing network traffic for distributed joins using lightweight locality-aware scheduling

Author: Aldinucci Marco
Cheng Long
Hao Chunliang
Liu Qingzhi
Murphy John
Padovani Luca
Theodoropoulos Georgios
Torquati Massimo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/08/2018
Field of study

Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art