35 research outputs found

    Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

    Full text link
    This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 4096^3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 seconds for the vortex method and 154 seconds for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex method calculations to date

    FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method

    Full text link
    The Lagrangian vortex method offers an alternative numerical approach for direct numerical simulation of turbulence. The fact that it uses the fast multipole method (FMM)--a hierarchical algorithm for N-body problems with highly scalable parallel implementations--as numerical engine makes it a potentially good candidate for exascale systems. However, there have been few validation studies of Lagrangian vortex simulations and the insufficient comparisons against standard DNS codes has left ample room for skepticism. This paper presents a comparison between a Lagrangian vortex method and a pseudo-spectral method for the simulation of decaying homogeneous isotropic turbulence. This flow field is chosen despite the fact that it is not the most favorable flow problem for particle methods (which shine in wake flows or where vorticity is compact), due to the fact that it is ideal for the quantitative validation of DNS codes. We use a 256^3 grid with Re_lambda=50 and 100 and look at the turbulence statistics, including high-order moments. The focus is on the effect of the various parameters in the vortex method, e.g., order of FMM series expansion, frequency of reinitialization, overlap ratio and time step. The vortex method uses an FMM code (exaFMM) that runs on GPU hardware using CUDA, while the spectral code (hit3d) runs on CPU only. Results indicate that, for this application (and with the current code implementations), the spectral method is an order of magnitude faster than the vortex method when using a single GPU for the FMM and six CPU cores for the FFT

    Pipelining the Fast Multipole Method over a Runtime System

    Get PDF
    Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012

    Derivation and analysis of the analytical velocity and vortex stretching expressions for an O (N log N)-FMM

    Get PDF
    In the current paper, a method for deriving the analytical expressions for the velocity and vortex stretching terms as a function of the spherical multipole expansion approximation of the vector potential is presented. These terms are essential in the context of 3D Lagrangian vortex particle methods combined with fast summation techniques. The convergence and computational efficiency of this approach is assessed in the framework of an O(N log N)-type Fast Multipole Method (FMM), by using vorticity particles to simulate a system of coaxial vortex rings for which also the exact results are known. It is found that the current implementation converges rapidly to the exact solution with increasing expansion order and acceptance factor. An investigation into the computational efficiency demonstrated that the O(N log N)-type FMM is already viable for a particle size of only several thousands and that this speedup increases significantly with the number of particles. Finally, i t is shown that the implementation of the FMM with the current analytical expressions is at least twice as fast as when opting for using even the simplest implementation of finite differences instead

    3D Lagrangian VPM: simulations of the near-wake of an actuator disc and horizontal axis wind turbine

    Get PDF
    The application of a 3-dimensional Lagrangian vortex particle method has been assessed for modelling the near-wake of an axisymmetrical actuator disc and 3-bladed horizontal axis wind turbine with prescribed circulation from the MEXICO (Model EXperiments In COntrolled conditions) experiment. The method was developed in the framework of the open- source Parallel Particle-Mesh library for handling the efficient data-parallelism on a CPU (Central Processing Unit) cluster, and utilized a O(N log N)-type fast multipole method for computational acceleration. Simulations with the actuator disc resulted in a wake expansion, velocity deficit profile, and induction factor that showed a close agreement with theoretical, numerical, and experimental results from literature. Also the shear layer expansion was present; the Kelvin-Helmholtz instability in the shear layer was triggered due to the round-off limitations of a numerical method, but this instability was delayed to beyond 1 diameter downstream due to the particle smoothing. Simulations with the 3-bladed turbine demonstrated that a purely 3-dimensional flow representation is challenging to model with particles. The manifestation of local complex flow structures of highly stretched vortices made the simulation unstable, but this was successfully counteracted by the application of a particle strength exchange scheme. The axial and radial velocity profile over the near wake have been compared to that of the original MEXICO experiment, which showed close agreement between results

    Scalable Fast Multipole Methods on Heterogeneous Architecture

    Get PDF
    The N-body problem appears in many computational physics simulations. At each time step the computation involves an all-pairs sum whose complexity is quadratic, followed by an update of particle positions. This cost means that it is not practical to solve such dynamic N-body problems on large scale. To improve this situation, we use both algorithmic and hardware approaches. Our algorithmic approach is to use the Fast Multipole Method (FMM), which is a divide-and-conquer algorithm that performs a fast N-body sum using a spatial decomposition and is often used in a time-stepping or iterative loop, to reduce such quadratic complexity to linear with guaranteed accuracy. Our hardware approach is to use heterogeneous clusters, which comprised of nodes that contain multi-core CPUs tightly coupled with accelerators, such as graphics processors unit (GPU) as our underline parallel processing hardware, on which efficient implementations require highly non-trivial re-designed algorithms. In this dissertation, we fundamentally reconsider the FMM algorithms on heterogeneous architectures to achieve a significant improvement over recent/previous implementations in literature and to make the algorithm ready for use as a workhorse simulation tool for both time-dependent vortex flow problems and for boundary element methods. Our major contributions include: 1. Novel FMM data structures using parallel construction algorithms for dynamic problems. 2. A fast hetegenenous FMM algorithm for both single and multiple computing nodes. 3. An efficient inter-node communication management using fast parallel data structures. 4. A scalable FMM algorithm using novel Helmholz decomposition for Vortex Methods (VM). The proposed algorithms can handle non-uniform distributions with irregular partition shapes to achieve workload balance and their MPI-CUDA implementations are highly tuned up and demonstrate the state of the art performances

    Wingtip Vortex Preservation Using a Coupled Vortex Particle Method and Computational Fluid Dynamics Solver

    Get PDF
    corecore