29 research outputs found

    Parallel implementation of the SHYFEM (System of HydrodYnamic Finite Element Modules) model

    Get PDF
    This paper presents the message passing interface (MPI)-based parallelization of the three-dimensional hydrodynamic model SHYFEM (System of HydrodYnamic Finite Element Modules). The original sequential version of the code was parallelized in order to reduce the execution time of high-resolution configurations using state-of-the-art high-performance computing (HPC) systems. A distributed memory approach was used, based on the MPI. Optimized numerical libraries were used to partition the unstructured grid (with a focus on load balancing) and to solve the sparse linear system of equations in parallel in the case of semi-to-fully implicit time stepping. The parallel implementation of the model was validated by comparing the outputs with those obtained from the sequential version. The performance assessment demonstrates a good level of scalability with a realistic configuration used as benchmark

    The NEMO Oceanic Model: Computational Performance Analysis and Optimization

    No full text
    The NEMO (Nucleus for European Modeling of the Ocean) oceanic model is one of the most widely used by the climate community. It is exploited with different configurations in more than 50 research projects for both long and short-term simulations. Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale. A deep revision and analysis of the model and its implementation were needed. The paper describes the performance evaluation of the last release of the model, based on MPI parallelization, on the MareNostrum platform at the Barcelona Supercomputing Centre. The analysis of the scalability has been carried out taking into account different factors, i.e. the I/O system available on the platform, the domain decomposition of the model and the level of the parallelism. The analysis highlighted different bottlenecks due to the communication overhead. The code has been optimized reducing the communication weight within some frequently called functions and the parallelization has been improved introducing a second level of parallelism based on the OpenMP shared memory paradigm

    Experience on the parallelization of the OASIS3 coupler

    No full text
    This work describes the optimization and paralleliza- tion of the OASIS3 coupler. Performance evaluation and profiling have been carried out by means of the CMCC-MED coupled model, developed at the Euro- Mediterranean Centre for Climate Change (CMCC) and currently running on a NEC SX9 cluster. Our experiments highlighted that extrapolation (accom- plished by the extrap function) and interpolation (im- plemented from the scriprmp function) transforma- tions take the most time. Optimization concerned I/O operations reducing coupling time by 27%. Paral- lelization of OASIS3 represents a further step towards overall improvement of the whole coupled model. Our proposed parallel approach distributes fields over a pool of available processes. Each process applies cou- pling transformations to its assigned fields. This ap- proach restricts parallelization level to the number of coupling fields. However, it can be fully combined with a parallelization approach considering the geo- graphical domain distribution. Finally a quantitative comparison of the parallel coupler with the OASIS3 pseudo-parallel version is proposed

    The Roofline Model for Oceanic Climate Applications

    No full text
    The present work describes the analysis and optimisation of the PELAGOS025 configuration based on the coupling of the NEMO physic component of the ocean dynamics and the BFM (Biogeochemical Flux Model), a sophisticated biogeochemical model that can simulate both pelagic and benthic processes. The methodology here followed is characterised by the performance analysis of the original parallel code, in terms of strong scalability, the definition of the bottlenecks limiting the scalability when the number of processes increases, the analysis of the features of the most computational intensive kernels through the Roofline model which provides an insightful visual performance model for multicore architectures and which allows to measure and compare the performance of one or more computational kernels run on different hardware architectures

    Near Real-Time Parallel Processing and Advanced Data Management of SAR Images in Grid Environments

    No full text
    In this paper, we describe the process of parallelizing an existing, production level, sequential Synthetic Aperture Radar (SAR) processor based on the Range-Doppler algorithmic approach. We show how, taking into account the constraints imposed by the software architecture and related software engineering costs, it is still possible with a moderate programming effort to parallelize the software and present an message-passing interface (MPI) implementation whose speedup is about 8 on 9 processors, achieving near real-time processing of raw SAR data even on a moderately aged parallel platform. Moreover, we discuss a hybrid two-level parallelization approach that involves the use of both MPI and OpenMP. We also present GridStore, a novel data grid service to manage raw, focused and post-processed SAR data in a grid environment. Indeed, another aim of this work is to show how the processed data can be made available in a grid environment to a wide scientific community, through the adoption of a data grid service providing both metadata and data management functionalities. In this way, along with near real-time processing of SAR images, we provide a data grid-oriented system for data storing, publishing, management, etc
    corecore