Spark deployment and performance evaluation on the MareNostrum supercomputer

Ayguadé Parra, Eduard; Becerra Fontal, Yolanda; Carrera Pérez, David; Girona Turell, Sergi; Gounaris, Anastasios; Labarta Mancho, Jesús José; Torres Viñals, Jordi; Tous Liesa, Rubén; Tripiana, Carlos; Valero Cortés, Mateo

research

Spark deployment and performance evaluation on the MareNostrum supercomputer

Authors: Eduard Ayguadé Parra
Yolanda Becerra Fontal
David Carrera Pérez
Sergi Girona Turell
Anastasios Gounaris
Jesús José Labarta Mancho
Jordi Torres Viñals
Rubén Tous Liesa
Carlos Tripiana
Mateo Valero Cortés
Publication date: 1 January 2015
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'
Doi

Abstract

In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a petascale supercomputer designed mainly for compute-intensive applications. As far as we know, this is the first attempt to investigate optimized deployment configurations of Spark on a petascale HPC setup. We detail the design of the framework and present some benchmark data to provide insights into the scalability of the system. We examine the impact of different configurations including parallelism, storage and networking alternatives, and we discuss several aspects in executing Big Data workloads on a computing system that is based on the compute-centric paradigm. Further, we derive conclusions aiming to pave the way towards systematic and optimized methodologies for fine-tuning data-intensive application on large clusters emphasizing on parallelism configurations.Peer ReviewedPostprint (author's final draft

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/100...

Last time updated on 01/05/2017

Crossref

info:doi/10.1109%2Fbigdata.201...

Last time updated on 01/04/2019