The Gaia AVU-GSR parallel solver: preliminary porting with OpenACC parallelization language of a LSQR-based application in perspective of exascale systems

Abstract

The Gaia Astrometric Verification Unit-Global Sphere Reconstruction (AVU-GSR) Parallel Solver aims to find the positions and the proper motions for ~10^8 stars in our galaxy, besides the attitude and the instrumental settings of the Gaia satellite, and the global parameter of the post Newtonian formalism. To find these parameters, the code solves a system of linear equations, Γ— = , where the coefficient matrix is large, containing ~10^11 x 10^8 elements, and sparse. The system of equations is solved with a customized implementation of the iterative preconditioned (PC)-LSQR algorithm and is parallelized on the CPU with MPI+OpenMP, where the computation related to different horizontal portions of the coefficient matrix is assigned to different MPI processes and it is further parallelized on the OpenMP threads. To improve the code performance, we explored the feasibility of a porting of this application on a GPU environment, by replacing the OpenMP directives with the OpenACC correspondent ones. In this preliminary porting, the ~95% of the data is copied from the host (CPU) to the device (GPU) before the entire cycle of iterations, making the code compute bound rather than data-transfers bound. The OpenACC code accelerates of a factor of ~1.5 compared to the OpenMP code. The OpenACC application runs on multiple GPUs and it was tested on the CINECA SuperComputer Marconi100, with 4 V100 GPUs per node having 16 GB of memory each. A following porting, where the OpenACC language is replaced with CUDA, was performed, optimizing the preliminary porting with OpenACC. The CUDA code has just been put into production on Marconi100 and we plan to run it on the future pre-exascale platform Leonardo of CINECA, with 4 next-generation A100 GPUs per node

    Similar works