Search CORE

3,568 research outputs found

A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence

Author: Mininni Pablo D.
Pouquet Annick
Reddy Raghu
Rosenberg Duane L.
Publication venue
Publication date: 22/03/2010
Field of study

A hybrid scheme that utilizes MPI for distributed memory parallelism and OpenMP for shared memory parallelism is presented. The work is motivated by the desire to achieve exceptionally high Reynolds numbers in pseudospectral computations of fluid turbulence on emerging petascale, high core-count, massively parallel processing systems. The hybrid implementation derives from and augments a well-tested scalable MPI-parallelized pseudospectral code. The hybrid paradigm leads to a new picture for the domain decomposition of the pseudospectral grids, which is helpful in understanding, among other things, the 3D transpose of the global data that is necessary for the parallel fast Fourier transforms that are the central component of the numerical discretizations. Details of the hybrid implementation are provided, and performance tests illustrate the utility of the method. It is shown that the hybrid scheme achieves near ideal scalability up to ~20000 compute cores with a maximum mean efficiency of 83%. Data are presented that demonstrate how to choose the optimal number of MPI processes and OpenMP threads in order to optimize code performance on two different platforms.Comment: Submitted to Parallel Computin

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

FluidFFT: common API (C++ and Python) for Fast Fourier Transform HPC libraries

Author: Augier Pierre
Bonamy Cyrille
Mohanan Ashwin Vishnu
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 03/07/2018
Field of study

The Python package fluidfft provides a common Python API for performing Fast Fourier Transforms (FFT) in sequential, in parallel and on GPU with different FFT libraries (FFTW, P3DFFT, PFFT, cuFFT). fluidfft is a comprehensive FFT framework which allows Python users to easily and efficiently perform FFT and the associated tasks, such as as computing linear operators and energy spectra. We describe the architecture of the package composed of C++ and Cython FFT classes, Python "operator" classes and Pythran functions. The package supplies utilities to easily test itself and benchmark the different FFT solutions for a particular case and on a particular machine. We present a performance scaling analysis on three different computing clusters and a microbenchmark showing that fluidfft is an interesting solution to write efficient Python applications using FFT

arXiv.org e-Print Archive

Publikationer från KTH

Hal - Université Grenoble Alpes

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

High performance Python for direct numerical simulations of turbulent flows

Author: Langtangen Hans Petter
Mortensen Mikael
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Evaluating Component Assembly Specialization for 3D FFT

Author: Christian Perez
Publication venue
Publication date
Field of study

The Fast Fourier Transform (FFT) is a widely-used building block for many high-performance scienti c applications. Ef- cient computing of FFT is paramount for the performance of these applications. This has led to many e orts to implement machine and computation speci c optimizations. However, no existing FFT library is capable of easily integrating and au- tomating the selection of new and/or unique optimizations. To ease FFT specialization, this paper evaluates the use of component-based software engineering, a programming paradigm which consists in building applications by assembling small software units. Component models are known to have many software engineering bene ts but usually have insucient performance for high-performance scienti c applications. This paper uses the L2C model, a general purpose high-performance component model, and studies its performance and adaptation capabilities on 3D FFTs. Experiments show that L2C, and components in general, enables easy handling of 3D FFT specializations while obtaining performance comparable to that of well-known libraries. However, a higher-level component model is needed to automatically generate an adequate L2C assembly

ZENODO