3,568 research outputs found
A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence
A hybrid scheme that utilizes MPI for distributed memory parallelism and
OpenMP for shared memory parallelism is presented. The work is motivated by the
desire to achieve exceptionally high Reynolds numbers in pseudospectral
computations of fluid turbulence on emerging petascale, high core-count,
massively parallel processing systems. The hybrid implementation derives from
and augments a well-tested scalable MPI-parallelized pseudospectral code. The
hybrid paradigm leads to a new picture for the domain decomposition of the
pseudospectral grids, which is helpful in understanding, among other things,
the 3D transpose of the global data that is necessary for the parallel fast
Fourier transforms that are the central component of the numerical
discretizations. Details of the hybrid implementation are provided, and
performance tests illustrate the utility of the method. It is shown that the
hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a
maximum mean efficiency of 83%. Data are presented that demonstrate how to
choose the optimal number of MPI processes and OpenMP threads in order to
optimize code performance on two different platforms.Comment: Submitted to Parallel Computin
FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries
The Python package fluidfft provides a common Python API for performing Fast
Fourier Transforms (FFT) in sequential, in parallel and on GPU with different
FFT libraries (FFTW, P3DFFT, PFFT, cuFFT). fluidfft is a comprehensive FFT
framework which allows Python users to easily and efficiently perform FFT and
the associated tasks, such as as computing linear operators and energy spectra.
We describe the architecture of the package composed of C++ and Cython FFT
classes, Python "operator" classes and Pythran functions. The package supplies
utilities to easily test itself and benchmark the different FFT solutions for a
particular case and on a particular machine. We present a performance scaling
analysis on three different computing clusters and a microbenchmark showing
that fluidfft is an interesting solution to write efficient Python applications
using FFT
High performance Python for direct numerical simulations of turbulent flows
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an
invaluable research tool in fluid dynamics. Still, there are few publicly
available research codes and, due to the heavy number crunching implied,
available codes are usually written in low-level languages such as C/C++ or
Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS
code that nearly matches the performance of C++ for thousands of processors and
billions of unknowns. We also describe a version optimized through Cython, that
is found to match the speed of C++. The solvers are written from scratch in
Python, both the mesh, the MPI domain decomposition, and the temporal
integrators. The solvers have been verified and benchmarked on the Shaheen
supercomputer at the KAUST supercomputing laboratory, and we are able to show
very good scaling up to several thousand cores.
A very important part of the implementation is the mesh decomposition (we
implement both slab and pencil decompositions) and 3D parallel Fast Fourier
Transforms (FFT). The mesh decomposition and FFT routines have been implemented
in Python using serial FFT routines (either NumPy, pyFFTW or any other serial
FFT module), NumPy array manipulations and with MPI communications handled by
MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT
in Python for a slab mesh decomposition using 4 lines of compact Python code,
for which the parallel performance on Shaheen is found to be slightly better
than similar routines provided through the FFTW library. For a pencil mesh
decomposition 7 lines of code is required to execute a transform
Evaluating Component Assembly Specialization for 3D FFT
The Fast Fourier Transform (FFT) is a widely-used building block for many high-performance scienti c applications. Ef-
cient computing of FFT is paramount for the performance of these applications. This has led to many e orts to implement
machine and computation speci c optimizations. However, no existing FFT library is capable of easily integrating and au-
tomating the selection of new and/or unique optimizations.
To ease FFT specialization, this paper evaluates the use of component-based software engineering, a programming paradigm
which consists in building applications by assembling small software units. Component models are known to have many software
engineering bene ts but usually have insucient performance for high-performance scienti c applications.
This paper uses the L2C model, a general purpose high-performance component model, and studies its performance and
adaptation capabilities on 3D FFTs. Experiments show that L2C, and components in general, enables easy handling of 3D FFT
specializations while obtaining performance comparable to that of well-known libraries. However, a higher-level component
model is needed to automatically generate an adequate L2C assembly
- …