3 research outputs found

    P3DFFT: a framework for parallel computations of Fourier transforms in three dimensions

    Full text link
    Fourier and related transforms is a family of algorithms widely employed in diverse areas of computational science, notoriously difficult to scale on high-performance parallel computers with large number of processing elements (cores). This paper introduces a popular software package called P3DFFT implementing Fast Fourier Transforms (FFT) in three dimensions (3D) in a highly efficient and scalable way. It overcomes a well-known scalability bottleneck of 3D FFT implementations by using two-dimensional domain decomposition. Designed for portable performance, P3DFFT achieves excellent timings for a number of systems and problem sizes. On Cray XT5 system P3DFFT attains 45% efficiency in weak scaling from 128 to 65,536 computational cores. Library features include Fourier and Chebyshev transforms, Fortran and C interfaces, in- and out-of-place transforms, uneven data grids, single and double precision. P3DFFT is available as open source at http://code.google.com/p/p3dfft/. This paper discusses P3DFFT implementation and performance in a way that helps guide the user in making optimal choices for parameters of their runs

    A Flexible Framework for Parallel Multi-Dimensional DFTs

    Full text link
    Multi-dimensional discrete Fourier transforms (DFT) are typically decomposed into multiple 1D transforms. Hence, parallel implementations of any multi-dimensional DFT focus on parallelizing within or across the 1D DFT. Existing DFT packages exploit the inherent parallelism across the 1D DFTs and offer rigid frameworks, that cannot be extended to incorporate both forms of parallelism and various data layouts to enable some of the parallelism. However, in the era of exascale, where systems have thousand of nodes and intricate network topologies, flexibility and parallel efficiency are key aspects all multi-dimensional DFT frameworks need to have in order to map and scale the computation appropriately. In this work, we present a flexible framework, built on the Redistribution Operations and Tensor Expressions (ROTE) framework, that facilitates the development of a family of parallel multi-dimensional DFT algorithms by 1) unifying the two parallelization schemes within a single framework, 2) exploiting the two different parallelization schemes to different degrees and 3) using different data layouts to distribute the data across the compute nodes. We demonstrate the need of a versatile framework and thus a need for a family of parallel multi-dimensional DFT algorithms on the K-Computer, where we show almost linear strong scaling results for problem sizes of 1024^3 on 32k compute nodes

    FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube

    Full text link
    In this work, we benchmark and discuss the performance of the scalable methods for the Poisson problem which are used widely in practice: the fast Fourier transform (FFT), the fast multipole method (FMM), the geometric multigrid (GMG), and algebraic multigrid (AMG). In total we compare five different codes, three of which are developed in our group. Our FFT, GMG, and FMM are parallel solvers that use high-order approximation schemes for Poisson problems with continuous forcing functions (the source or right-hand side). We examine and report results for weak scaling, strong scaling, and time to solution for uniform and highly refined grids. We present results on the Stampede system at the Texas Advanced Computing Center and on the Titan system at the Oak Ridge National Laboratory. In our largest test case, we solved a problem with 600 billion unknowns on 229,379 cores of Titan. Overall, all methods scale quite well to these problem sizes. We have tested all of the methods with different source functions (the right-hand side in the Poisson problem). Our results indicate that FFT is the method of choice for smooth source functions that require uniform resolution. However, FFT loses its performance advantage when the source function has highly localized features like internal sharp layers. FMM and GMG considerably outperform FFT for those cases. The distinction between FMM and GMG is less pronounced and is sensitive to the quality (from a performance point of view) of the underlying implementations. The high-order accurate versions of GMG and FMM significantly outperform their low-order accurate counterparts.Comment: 25 pages; accepted paper in SISC journa
    corecore