Search CORE

1,709 research outputs found

A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters

Author: Brandt A.
Brandvik T.
Corrigan A.
Cwire
Cwire
Elsen E.
Fan Z.
Goodnight N.
Griebel M.
Gropp W. D.
Göddeke D.
Hempel R.
Kindratenko V.
Matsuoka S.
McBryan O. A.
Micikevicius P.
Owens J.D.
Press W. H.
Schive H.
Showerman M.
Thibault J. C.
Tokyo Institute
Wan D.C.
Publication venue: 'IUScholarWorks'
Publication date: 04/01/2011
Field of study

Numerical computations of incompressible flow equations with pressure-based algorithms necessitate the solution of an elliptic Poisson equation, for which multigrid methods are known to be very efficient. In our previous work we presented a dual-level (MPI-CUDA) parallel implementation of the Navier-Stokes equations to simulate buoyancy-driven incompressible fluid flows on GPU clusters with simple iterative methods while focusing on the scalability of the overall solver. In the present study we describe the implementation and performance of a multigrid method to solve the pressure Poisson equation within our MPI-CUDA parallel incompressible flow solver. Various design decisions and algorithmic choices for multigrid methods are explored in light of NVIDIA’s recent Fermi architecture. We discuss how unique aspects of an MPI-CUDA implementation for GPU clusters is related to the software choices made to implement the multigrid method. We propose a new coarse grid solution method of embedded multigrid with amalgamation and show that the parallel implementation retains the numerical efficiency of the multigrid method. Performance measurements on the NCSA Lincoln and TACC Longhorn clusters are presented for up to 64 GPUs

Crossref

Boise State University - ScholarWorks

Parallel semiconductor device simulation: from power to 'atomistic' devices

Author: Asenov A.
Brown A.R.
Roy S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

This paper discusses various aspects of the parallel simulation of semiconductor devices on mesh connected MIMD platforms with distributed memory and a message passing programming paradigm. We describe the spatial domain decomposition approach adopted in the simulation of various devices, the generation of structured topologically rectangular 2D and 3D finite element grids and the optimisation of their partitioning using simulated annealing techniques. The development of efficient and scalable parallel solvers is a central issue of parallel simulations and the design of parallel SOR, conjugate gradient and multigrid solvers is discussed. The domain decomposition approach is illustrated in examples ranging from `atomistic' simulation of decanano MOSFETs to simulation of power IGBTs rated for 1000 V

Enlighten

Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio

Author: Adams
Adams
Ashby
Baker
Barros
Bastian
Bastian
Bastian
Bates
Bey
Bowman
Brandt
Buckeridge
Buckeridge
Burri
Börm
Chen
Cotter
Cotter
Crank
Davies
De Zeeuw
Dedner
Dendy
Dendy
Falgout
Falgout
Fringer
Hackbusch
Hackbusch
Hackbusch
Hess
Ippisch
Kwizak
Lacroix
MacDonald
Marshall
Mavriplis
Mulder
Müller
Müller
Oosterlee
Oosterlee
Press
Qaddouri
Reisinger
Robert
Saad
Sadourny
Schaffer
Skamarock
Staniforth
Stüben
Thomas
Trottenberg
Vorst
Wesseling
Wood
Publication venue
Publication date: 10/02/2015
Field of study

Many problems in fluid modelling require the efficient solution of highly anisotropic elliptic partial differential equations (PDEs) in "flat" domains. For example, in numerical weather- and climate-prediction an elliptic PDE for the pressure correction has to be solved at every time step in a thin spherical shell representing the global atmosphere. This elliptic solve can be one of the computationally most demanding components in semi-implicit semi-Lagrangian time stepping methods which are very popular as they allow for larger model time steps and better overall performance. With increasing model resolution, algorithmically efficient and scalable algorithms are essential to run the code under tight operational time constraints. We discuss the theory and practical application of bespoke geometric multigrid preconditioners for equations of this type. The algorithms deal with the strong anisotropy in the vertical direction by using the tensor-product approach originally analysed by B\"{o}rm and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219-234]. We extend the analysis to three dimensions under slightly weakened assumptions, and numerically demonstrate its efficiency for the solution of the elliptic PDE for the global pressure correction in atmospheric forecast models. For this we compare the performance of different multigrid preconditioners on a tensor-product grid with a semi-structured and quasi-uniform horizontal mesh and a one dimensional vertical grid. The code is implemented in the Distributed and Unified Numerics Environment (DUNE), which provides an easy-to-use and scalable environment for algorithms operating on tensor-product grids. Parallel scalability of our solvers on up to 20,480 cores is demonstrated on the HECToR supercomputer.Comment: 22 pages, 6 Figures, 2 Table

arXiv.org e-Print Archive

OPUS

Crossref

Warwick Research Archives Portal Repository

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

Author: Barnes
Chatelain
Cheng
Cottet
Davidson
Dehnen
Gingold
Greengard
Hamada
Ishihara
Kenji Yasuoka
L.A. Barba
Lambert
Rahimian
Rio Yokota
Salmon
Sundar
Tetsu Narumi
Warren
Warren
Yokokawa
Yokota
Yokota
Yokota
Yokota
Yokota
Publication venue: 'Elsevier BV'
Publication date: 03/09/2012
Field of study

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 4096^3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 seconds for the vortex method and 154 seconds for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex method calculations to date

arXiv.org e-Print Archive

Crossref