Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions

Caino Lores, Silvina; Carretero Pérez, Jesús; Kropf, Peter; Lapin, Andei

Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions

Authors: Silvina Caino Lores
Jesús Carretero Pérez
Peter Kropf
Andei Lapin
Publication date: 17 April 2018
Publisher: 'Elsevier BV'
Doi

Abstract

The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKF-HGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.This work was supported by the Spanish Ministry of Economics and Competitiveness grant TIN-2013-41350-P, the IC1305 COST Action “Network for Sustainable Ultrascale Computing Platforms” (NESUS), and the FPU Training Program for Academic and Teaching Staff Grant FPU15/00422 by the Spanish Ministry of Education

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Universidad Carlos III de Madrid e-Archivo

oai:e-archivo.uc3m.es:10016/33...

Last time updated on 27/10/2022