Search CORE

15 research outputs found

Scaling of a Fast Fourier Transform and a pseudo-spectral fluid solver up to 196608 cores

Author: Chatterjee Anando G.
Hadri Bilel
Khurram Rooh
Kumar Abhishek
Samtaney Ravi
Verma Mahendra K.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

In this paper we present scaling results of a FFT library, FFTK, and a pseudospectral code, Tarang, on grid resolutions up to

8192^3

grid using 65536 cores of Blue Gene/P and 196608 cores of Cray XC40 supercomputers. We observe that communication dominates computation, more so on the Cray XC40. The computation time scales as

T_\mathrm{comp} \sim p^{-1}

, and the communication time as

T_\mathrm{comm} \sim n^{-\gamma_2}

with

\gamma_2

ranging from 0.7 to 0.9 for Blue Gene/P, and from 0.43 to 0.73 for Cray XC40. FFTK, and the fluid and convection solvers of Tarang exhibit weak as well as strong scaling nearly up to 196608 cores of Cray XC40. We perform a comparative study of the performance on the Blue Gene/P and Cray XC40 clusters

arXiv.org e-Print Archive

Coventry University Pure Portal

An Efficient Particle Tracking Algorithm for Large-Scale Parallel Pseudo-Spectral Simulations of Turbulence

Author: Bramas Bérenger
Lalescu Cristian C.
Rampp Markus
Wilczek Michael
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Particle tracking in large-scale numerical simulations of turbulent flows presents one of the major bottlenecks in parallel performance and scaling efficiency. Here, we describe a particle tracking algorithm for large-scale parallel pseudo-spectral simulations of turbulence which scales well up to billions of tracer particles on modern high-performance computing architectures. We summarize the standard parallel methods used to solve the fluid equations in our hybrid MPI/OpenMP implementation. As the main focus, we describe the implementation of the particle tracking algorithm and document its computational performance. To address the extensive inter-process communication required by particle tracking, we introduce a task-based approach to overlap point-to-point communications with computations, thereby enabling improved resource utilization. We characterize the computational cost as a function of the number of particles tracked and compare it with the flow field computation, showing that the cost of particle tracking is very small for typical applications

arXiv.org e-Print Archive

HAL-Inserm

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

MPG.PuRe

GPU parallelization of a hybrid pseudospectral geophysical turbulence framework using CUDA

Author: Mininni Pablo Daniel
Pouquet Annick
Reddy Raghu
Rosenberg Duane
Publication venue: 'MDPI AG'
Publication date: 01/02/2020
Field of study

An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grain parallelization approach for multidimensional distributed Fourier transforms, in a well-characterized pseudospectral fluid turbulence code. Basics of the hybrid scheme are reviewed, and heuristics provided to show a potential benefit of the CUDA implementation. The method draws heavily on the CUDA runtime library to handle memory management and on the cuFFT library for computing local FFTs. The manner in which the interfaces to these libraries are constructed, and ISO bindings utilized to facilitate platform portability, are discussed. CUDA streams are implemented to overlap data transfer with cuFFT computation. Testing with a baseline solver demonstrated significant aggregate speed-up over the hybrid MPI-OpenMP solver by offloading to GPUs on an NVLink-based test system. While the batch streamed approach provided little benefit with NVLink, we saw a performance gain of 30% when tuned for the optimal number of streams on a PCIe-based system. It was found that strong GPU scaling is nearly ideal, in all cases. Profiling of the CUDA kernels shows that the transform computation achieves 15% of the attainable peak FlOp-rate based on a roofline model for the system. In addition to speed-up measurements for the fiducial solver, we also considered several other solvers with different numbers of transform operations and found that aggregate speed-ups are nearly constant for all solvers.Fil: Rosenberg, Duane. State University of Colorado - Fort Collins; Estados UnidosFil: Mininni, Pablo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Reddy, Raghu. Environmental Modeling Center; Estados UnidosFil: Pouquet, Annick. State University of Colorado at Boulder; Estados Unidos. National Center for Atmospheric Research; Estados Unido

Multidisciplinary Digital Publishing Institute

CONICET Digital

The inherent overlapping in the parallel calculation of the Laplacian

Author: Chamorro Posada Pedro
Sánchez Curto Julio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Producción CientíficaA new approach for the parallel computation of the Laplacian in the Fourier domain is presented. This numerical problem inherits the intrinsic sequencing involved in the calculation of any multidimensional Fast Fourier Transform (FFT) where blocking communications assure that its computation is strictly carried out dimension by dimension. Such data dependency vanishes when one considers the Laplacian as the sum of n independent one-dimensional kernels, so that computation and communication can be naturally overlapped with nonblocking communications. Overlapping is demonstrated to be responsible for the speedup figures we obtain when our approach is compared to state-of-the-art parallel multidimensional FFTs.Junta de Castilla León (grant number VA296P18

Repositorio Documental de la Universidad de Valladolid

Turbulence in a stably stratified fluid: Onset of global anisotropy as a function of the Richardson number

Author: Bhattacharjee Jayanta K
Kumar Abhishek
Verma Mahendra K.
Publication venue: 'IOP Publishing'
Publication date: 30/08/2019
Field of study

It is necessary to introduce an external forcing to induce turbulence in a stably stratified fluid. The Heisenberg eddy viscosity technique should in this case suffice to calculate a space-time averaged quantity like the global anisotropy parameter as a function of the Richardson number. We find analytically that the anisotropy increases linearly with the Richardson number, with a small quadratic correction. A numerical simulation of the complete equations shows the linear behaviour

arXiv.org e-Print Archive

Coventry University Pure Portal