22 research outputs found

    Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters

    Get PDF
    Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with trillions (\order(10^{12})) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters.We describe the multi-GPU implementation of two algorithmically optimal iterative solvers for anisotropic PDEs which are encountered in (semi-) implicit time stepping procedures in atmospheric modelling. In this application the condition number is large but independent of the grid resolution and both methods are asymptotically optimal, albeit with different absolute performance. In particular, an important constant in the discretisation is the CFL number; only the multigrid solver is robust to changes in this constant. We parallelise the solvers and adapt them to the specific features of GPU architectures, paying particular attention to efficient global memory access. We achieve a performance of up to 0.78 PFLOPs when solving an equation with 0.55⋅10120.55\cdot 10^{12} unknowns on 16384 GPUs; this corresponds to about 3%3\% of the theoretical peak performance of the machine and we use more than 40%40\% of the peak memory bandwidth with a Conjugate Gradient (CG) solver. Although the other solver, a geometric multigrid algorithm, has a slightly worse performance in terms of FLOPs per second, overall it is faster as it needs less iterations to converge; the multigrid algorithm can solve a linear PDE with half a trillion unknowns in about one second

    Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs

    Get PDF
    Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in "flat" three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction models, and equations of a very similar structure arise in global ocean models, subsurface flow simulations and gas and oil reservoir modelling. The elliptic solve is often the bottleneck of the forecast, and an algorithmically optimal method has to be used and implemented efficiently. Graphics Processing Units have been shown to be highly efficient for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures. We describe the GPU implementation and optimisation of a Preconditioned Conjugate Gradient (PCG) algorithm for the solution of a three dimensional anisotropic elliptic PDE for the pressure correction in NWP. Our implementation exploits the strong vertical anisotropy of the elliptic operator in the construction of a suitable preconditioner. As the algorithm is memory bound, performance can be improved significantly by reducing the amount of global memory access. We achieve this by using a matrix-free implementation which does not require explicit storage of the matrix and instead recalculates the local stencil. Global memory access can also be reduced by rewriting the algorithm using loop fusion and we show that this further reduces the runtime on the GPU. We demonstrate the performance of our matrix-free GPU code by comparing it to a sequential CPU implementation and to a matrix-explicit GPU code which uses existing libraries. The absolute performance of the algorithm for different problem sizes is quantified in terms of floating point throughput and global memory bandwidth.Comment: 18 pages, 7 figure

    On Efficiency of Parallel Solvers for the Blood Flow through Aortic Valve

    Get PDF
    Mathematical modelling of cardiac haemodynamics presents a great challenge to the computational scientists due to numerous numerical issues and required computational resources. In this paper, we study the parallel performance of 3D simulation software for the blood flow through the aortic valve. The fluid flow problem with the open aortic valve leaflets is formulated and solved in parallel. The choice between the segregated and coupled numerical schemes is discussed and investigated. We present and compare the parallel performance results of both types of parallel solvers. We investigate their strong and weak scalability

    Efficient multigrid preconditioners for atmospheric flow simulations at high aspect ratio

    Get PDF
    Many problems in fluid modelling require the efficient solution of highly anisotropic elliptic partial differential equations (PDEs) in ‘flat’ domains. For example, in numerical weather and climate prediction, an elliptic PDE for the pressure correction has to be solved at every time step in a thin spherical shell representing the global atmosphere. This elliptic solve can be one of the computationally most demanding components in semi‐implicit semi‐Lagrangian time stepping methods, which are very popular as they allow for larger model time steps and better overall performance. With increasing model resolution, algorithmically efficient and scalable algorithms are essential to run the code under tight operational time constraints. We discuss the theory and practical application of bespoke geometric multigrid preconditioners for equations of this type. The algorithms deal with the strong anisotropy in the vertical direction by using the tensor‐product approach originally analysed by Börm and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219–234]. We extend the analysis to three dimensions under slightly weakened assumptions and numerically demonstrate its efficiency for the solution of the elliptic PDE for the global pressure correction in atmospheric forecast models. For this, we compare the performance of different multigrid preconditioners on a tensor‐product grid with a semi‐structured and quasi‐uniform horizontal mesh and a one‐dimensional vertical grid. The code is implemented in the Distributed and Unified Numerics Environment, which provides an easy‐to‐use and scalable environment for algorithms operating on tensor‐product grids. Parallel scalability of our solvers on up to 20 480 cores is demonstrated on the HECToR supercomputer

    HPC-enabling technologies for high-fidelity combustion simulations

    Get PDF
    With the increase in computational power in the last decade and the forthcoming Exascale supercomputers, a new horizon in computational modelling and simulation is envisioned in combustion science. Considering the multiscale and multiphysics characteristics of turbulent reacting flows, combustion simulations are considered as one of the most computationally demanding applications running on cutting-edge supercomputers. Exascale computing opens new frontiers for the simulation of combustion systems as more realistic conditions can be achieved with high-fidelity methods. However, an efficient use of these computing architectures requires methodologies that can exploit all levels of parallelism. The efficient utilization of the next generation of supercomputers needs to be considered from a global perspective, that is, involving physical modelling and numerical methods with methodologies based on High-Performance Computing (HPC) and hardware architectures. This review introduces recent developments in numerical methods for large-eddy simulations (LES) and direct-numerical simulations (DNS) to simulate combustion systems, with focus on the computational performance and algorithmic capabilities. Due to the broad scope, a first section is devoted to describe the fundamentals of turbulent combustion, which is followed by a general description of state-of-the-art computational strategies for solving these problems. These applications require advanced HPC approaches to exploit modern supercomputers, which is addressed in the third section. The increasing complexity of new computing architectures, with tightly coupled CPUs and GPUs, as well as high levels of parallelism, requires new parallel models and algorithms exposing the required level of concurrency. Advances in terms of dynamic load balancing, vectorization, GPU acceleration and mesh adaptation have permitted to achieve highly-efficient combustion simulations with data-driven methods in HPC environments. Therefore, dedicated sections covering the use of high-order methods for reacting flows, integration of detailed chemistry and two-phase flows are addressed. Final remarks and directions of future work are given at the end. }The research leading to these results has received funding from the European Union’s Horizon 2020 Programme under the CoEC project, grant agreement No. 952181 and the CoE RAISE project grant agreement no. 951733.Peer ReviewedPostprint (published version

    Multigrid preconditioners for the mixed finite element dynamical core of the LFRic atmospheric model

    Get PDF
    Due to the wide separation of time scales in geophysical fluid dynamics, semi-implicit time integrators are commonly used in operational atmospheric forecast models. They guarantee the stable treatment of fast (acoustic and gravity) waves, while not suffering from severe restrictions on the timestep size. To propagate the state of the atmosphere forward in time, a non-linear equation for the prognostic variables has to be solved at every timestep. Since the nonlinearity is typically weak, this is done with a small number of Newton- or Picard- iterations, which in turn require the efficient solution of a large system on linear equations with O(106 − 109) unknowns. This linear solve is often the computationally most costly part of the model. In this paper an efficient linear solver for the LFRic next-generation model, currently developed by the Met Office, is described. The model uses an advanced mimetic finite element discretisation which makes the construction of efficient solvers challenging compared to models using standard finite-difference and finite-volume methods. The linear solver hinges on a bespoke multigrid preconditioner of the Schur-complement system for the pressure correction. By comparing to Krylov-subspace methods, the superior performance and robustness of the multigrid algorithm is demonstrated for standard test cases and realistic model setups. In production mode, the model will have to run in parallel on 100,000s of processing elements. As confirmed by numerical experiments, one particular advantage of the multigrid solver is its excellent parallel scalability due to avoiding expensive global reduction operations

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
    corecore