15 research outputs found

    Scaling of a Fast Fourier Transform and a pseudo-spectral fluid solver up to 196608 cores

    Get PDF
    In this paper we present scaling results of a FFT library, FFTK, and a pseudospectral code, Tarang, on grid resolutions up to 819238192^3 grid using 65536 cores of Blue Gene/P and 196608 cores of Cray XC40 supercomputers. We observe that communication dominates computation, more so on the Cray XC40. The computation time scales as Tcomp∼p−1T_\mathrm{comp} \sim p^{-1}, and the communication time as Tcomm∼n−γ2T_\mathrm{comm} \sim n^{-\gamma_2} with γ2\gamma_2 ranging from 0.7 to 0.9 for Blue Gene/P, and from 0.43 to 0.73 for Cray XC40. FFTK, and the fluid and convection solvers of Tarang exhibit weak as well as strong scaling nearly up to 196608 cores of Cray XC40. We perform a comparative study of the performance on the Blue Gene/P and Cray XC40 clusters

    An Efficient Particle Tracking Algorithm for Large-Scale Parallel Pseudo-Spectral Simulations of Turbulence

    Get PDF
    Particle tracking in large-scale numerical simulations of turbulent flows presents one of the major bottlenecks in parallel performance and scaling efficiency. Here, we describe a particle tracking algorithm for large-scale parallel pseudo-spectral simulations of turbulence which scales well up to billions of tracer particles on modern high-performance computing architectures. We summarize the standard parallel methods used to solve the fluid equations in our hybrid MPI/OpenMP implementation. As the main focus, we describe the implementation of the particle tracking algorithm and document its computational performance. To address the extensive inter-process communication required by particle tracking, we introduce a task-based approach to overlap point-to-point communications with computations, thereby enabling improved resource utilization. We characterize the computational cost as a function of the number of particles tracked and compare it with the flow field computation, showing that the cost of particle tracking is very small for typical applications

    GPU parallelization of a hybrid pseudospectral geophysical turbulence framework using CUDA

    Get PDF
    An existing hybrid MPI-OpenMP scheme is augmented with a CUDA-based fine grain parallelization approach for multidimensional distributed Fourier transforms, in a well-characterized pseudospectral fluid turbulence code. Basics of the hybrid scheme are reviewed, and heuristics provided to show a potential benefit of the CUDA implementation. The method draws heavily on the CUDA runtime library to handle memory management and on the cuFFT library for computing local FFTs. The manner in which the interfaces to these libraries are constructed, and ISO bindings utilized to facilitate platform portability, are discussed. CUDA streams are implemented to overlap data transfer with cuFFT computation. Testing with a baseline solver demonstrated significant aggregate speed-up over the hybrid MPI-OpenMP solver by offloading to GPUs on an NVLink-based test system. While the batch streamed approach provided little benefit with NVLink, we saw a performance gain of 30% when tuned for the optimal number of streams on a PCIe-based system. It was found that strong GPU scaling is nearly ideal, in all cases. Profiling of the CUDA kernels shows that the transform computation achieves 15% of the attainable peak FlOp-rate based on a roofline model for the system. In addition to speed-up measurements for the fiducial solver, we also considered several other solvers with different numbers of transform operations and found that aggregate speed-ups are nearly constant for all solvers.Fil: Rosenberg, Duane. State University of Colorado - Fort Collins; Estados UnidosFil: Mininni, Pablo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Física de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Física de Buenos Aires; ArgentinaFil: Reddy, Raghu. Environmental Modeling Center; Estados UnidosFil: Pouquet, Annick. State University of Colorado at Boulder; Estados Unidos. National Center for Atmospheric Research; Estados Unido

    The inherent overlapping in the parallel calculation of the Laplacian

    Get PDF
    Producción CientíficaA new approach for the parallel computation of the Laplacian in the Fourier domain is presented. This numerical problem inherits the intrinsic sequencing involved in the calculation of any multidimensional Fast Fourier Transform (FFT) where blocking communications assure that its computation is strictly carried out dimension by dimension. Such data dependency vanishes when one considers the Laplacian as the sum of n independent one-dimensional kernels, so that computation and communication can be naturally overlapped with nonblocking communications. Overlapping is demonstrated to be responsible for the speedup figures we obtain when our approach is compared to state-of-the-art parallel multidimensional FFTs.Junta de Castilla León (grant number VA296P18

    Turbulence in a stably stratified fluid: Onset of global anisotropy as a function of the Richardson number

    Get PDF
    It is necessary to introduce an external forcing to induce turbulence in a stably stratified fluid. The Heisenberg eddy viscosity technique should in this case suffice to calculate a space-time averaged quantity like the global anisotropy parameter as a function of the Richardson number. We find analytically that the anisotropy increases linearly with the Richardson number, with a small quadratic correction. A numerical simulation of the complete equations shows the linear behaviour
    corecore