H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Abelló Gamazo, Alberto; Calders, Toon; Jovanovic, Petar; Romero Moral, Óscar

research

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Authors: Alberto Abelló Gamazo
Toon Calders
Petar Jovanovic
Óscar Romero Moral
Publication date: 1 January 2016
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44039-2_21Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we addressthe challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.Peer ReviewedPostprint (author's final draft

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons

oai:upcommons.upc.edu:2117/103...

Last time updated on 17/04/2020

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/103...

Last time updated on 01/05/2017