Design and implementation of Kepler workflows for BioEarth

Abstract

BioEarth is an ongoing research initiative for the development of a regional-scale Earth System Model (EaSM) for the U.S. Pacific Northwest. In order to build such a model, we need to couple multiple stand-alone EaSMs, which were originally developed independently, for capturing processes within different realms of the biosphere. Given the complexity of such coupled modeling, and the need to manage numerous complex simulations, the design and deployment of automated workflows becomes essential. The goal of this thesis to report on the design and development of automated scientific workflows for the Regional HydroEcologic Simulation System (RHESSys) model, using the Kepler workflow development tool. RHESSys is a hydrological model that is at the core of BioEarthâĂŹs model integration requirements. Design of these Kepler workflows is aimed at enabling the use of RHESSys in two different modes: i) in a standalone mode (both sequentially and in parallel), and ii) for calibration runs that involve exploring parametric space through iterative executions. Various Kepler features are utilized, including (but not limited to) its user-friendly interface design functions, and its support for parallel execution in cluster-based environments. Experimental results on a 16-core compute cluster demonstrate performance speedups ranging iv from 7x to 12x over the default standalone sequential runs, while also showing the general effectiveness of the newly designed workflows to streamline and mange processes efficiently. This study has shown the potential of Kepler to serve as the primary operational software platform for the BioEarth project, with implications for other data- and compute-intensive Earth systems modeling project

    Similar works