147 research outputs found
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Doctor of Philosophy
dissertationPartial differential equations (PDEs) are widely used in science and engineering to model phenomena such as sound, heat, and electrostatics. In many practical science and engineering applications, the solutions of PDEs require the tessellation of computational domains into unstructured meshes and entail computationally expensive and time-consuming processes. Therefore, efficient and fast PDE solving techniques on unstructured meshes are important in these applications. Relative to CPUs, the faster growth curves in the speed and greater power efficiency of the SIMD streaming processors, such as GPUs, have gained them an increasingly important role in the high-performance computing area. Combining suitable parallel algorithms and these streaming processors, we can develop very efficient numerical solvers of PDEs. The contributions of this dissertation are twofold: proposal of two general strategies to design efficient PDE solvers on GPUs and the specific applications of these strategies to solve different types of PDEs. Specifically, this dissertation consists of four parts. First, we describe the general strategies, the domain decomposition strategy and the hybrid gathering strategy. Next, we introduce a parallel algorithm for solving the eikonal equation on fully unstructured meshes efficiently. Third, we present the algorithms and data structures necessary to move the entire FEM pipeline to the GPU. Fourth, we propose a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. This algorithm combines a narrowband scheme with domain decomposition for efficient levelset equation solving
Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio
Many problems in fluid modelling require the efficient solution of highly
anisotropic elliptic partial differential equations (PDEs) in "flat" domains.
For example, in numerical weather- and climate-prediction an elliptic PDE for
the pressure correction has to be solved at every time step in a thin spherical
shell representing the global atmosphere. This elliptic solve can be one of the
computationally most demanding components in semi-implicit semi-Lagrangian time
stepping methods which are very popular as they allow for larger model time
steps and better overall performance. With increasing model resolution,
algorithmically efficient and scalable algorithms are essential to run the code
under tight operational time constraints. We discuss the theory and practical
application of bespoke geometric multigrid preconditioners for equations of
this type. The algorithms deal with the strong anisotropy in the vertical
direction by using the tensor-product approach originally analysed by B\"{o}rm
and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219-234]. We extend the
analysis to three dimensions under slightly weakened assumptions, and
numerically demonstrate its efficiency for the solution of the elliptic PDE for
the global pressure correction in atmospheric forecast models. For this we
compare the performance of different multigrid preconditioners on a
tensor-product grid with a semi-structured and quasi-uniform horizontal mesh
and a one dimensional vertical grid. The code is implemented in the Distributed
and Unified Numerics Environment (DUNE), which provides an easy-to-use and
scalable environment for algorithms operating on tensor-product grids. Parallel
scalability of our solvers on up to 20,480 cores is demonstrated on the HECToR
supercomputer.Comment: 22 pages, 6 Figures, 2 Table
Parallel AMG Solver for Three Dimensional Unstructured Grids Using Gpus
Consider a set of points P in three dimensional euclidean space. Each point in P represents a
variable and its value is dependent on the value of its neighborhood scaled by predefined constants.
The problem is to solve all the variables which reduces to solving a large set of sparse linear equations.
This kind of representation arises naturally while solving flow equations in Computational
Fluid Dynamics (CFD). Graphics Processing Units (GPUs), over the years have evolved from being
graphics accelerator to scalable co-processor. We implement an algebraic multigrid solver for three
dimensional unstructured grids using GPUs. Such a solver has extensive applications in Computational
Fluid Dynamics. Using a combination of vertex coloring, optimized memory representations,
multi-grid and improved coarsening techniques, we obtain considerable speedup in our parallel implementation.
For our implementation, we used Nvidia’s CUDA programming model. Our solver
is used to accelerate solutions to various problems like heat transfer, Navier-Stokes etc. Our solver
achieves 2157 and 29 times speed up for steady state and unsteady state head transfer problem respectively
on a grid of size 2.3 million, compared to serial non-multigrid implementation. Our solver
provides significant acceleration for solving pressure Poisson equations, which is the most time consuming
part while solving Navier-Stokes equations. In our experimental study, we solve pressure
Poisson equations for flow over lid driven cavity, laminar flow past square cylinder and plain jet
problems. Our implementation achieves 915 times speed up for the lid driven cavity problem on
a grid of size 2.6 million and a speed up of 1020 times for the laminar flow past square cylinder
problem on a grid of size 1.7 million, compared to serial non-multigrid implementations. For plain
jet problem, our solver achieves a speed up of 47 times, compared to serial non-multigrid implementation
on a grid of size 2.7 million. We also implement multi GPU AMG solver which achieves a
speed up of 1.5 times, compared to single GPU solver for heat transfer problem
Development of Parallel CFD Solver for Three Dimnesional Unstructured Grid
The current work develops a general purpose Navier-Stokes semi implicit solver capable of handling
three-dimensional unstructured grids. The
ow needs to be laminar and incompressible. Species
transport equation can also be solved using a segregated algorithm. Pressure Poisson equation that
takes most of the solving time has been parallelized using CUDA programming language on GPU,
with Algebraic Multigrid for orthogonal unstructured grids. Domain decomposition has been done
using greedy colouring method. Single phase jets have been studied in presence of walls has been
studied, which is of interest in Internal Combustion engines. Large Eddy simulation (LES) modeling
has been employed for simulating turbulent
ows using Static Smagrosnky model. Validations have
been presented for turbulent round and plane jets. Laminar and turbulent coaxial jets for different
velocity ratios for has been simulated and the effect of faster annular jet on the core of inner jet is
analyzed and presented
Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs
Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution – especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrixbased multigrid that has high flexibility and adapts well to the exigences of modern computing platforms. In this work we investigate multi-colored Gauß-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fillins
- …