58 research outputs found
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
This paper reports large-scale direct numerical simulations of
homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08
petaflop/s on gpu hardware using single precision. The simulations use a vortex
particle method to solve the Navier-Stokes equations, with a highly parallel
fast multipole method (FMM) as numerical engine, and match the current record
in mesh size for this application, a cube of 4096^3 computational points solved
with a spectral method. The standard numerical approach used in this field is
the pseudo-spectral method, relying on the FFT algorithm as numerical engine.
The particle-based simulations presented in this paper quantitatively match the
kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted
code. In terms of parallel performance, weak scaling results show the fmm-based
vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per
mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral
method is able to achieve just 14% parallel efficiency on the same number of
mpi processes (using only cpu cores), due to the all-to-all communication
pattern of the FFT algorithm. The calculation time for one time step was 108
seconds for the vortex method and 154 seconds for the spectral method, under
these conditions. Computing with 69 billion particles, this work exceeds by an
order of magnitude the largest vortex method calculations to date
A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems
Among the algorithms that are likely to play a major role in future exascale
computing, the fast multipole method (FMM) appears as a rising star. Our
previous recent work showed scaling of an FMM on GPU clusters, with problem
sizes in the order of billions of unknowns. That work led to an extremely
parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This
paper reports on a a campaign of performance tuning and scalability studies
using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were
parallelized using OpenMP, and a test using 10^7 particles randomly distributed
in a cube showed 78% efficiency on 8 threads. Tuning of the
particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of
the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel
scalability was studied in both strong and weak scaling. The strong scaling
test used 10^8 particles and resulted in 93% parallel efficiency on 2048
processes for the non-SIMD code and 54% for the SIMD-optimized code (which was
still 2x faster). The weak scaling test used 10^6 particles per process, and
resulted in 72% efficiency on 32,768 processes, with the largest calculation
taking about 40 seconds to evaluate more than 32 billion unknowns. This work
builds up evidence for our view that FMM is poised to play a leading role in
exascale computing, and we end the paper with a discussion of the features that
make it a particularly favorable algorithm for the emerging heterogeneous and
massively parallel architectural landscape
Fast multipole networks
Two prerequisites for robotic multiagent systems are mobility and
communication. Fast multipole networks (FMNs) enable both ends within a unified
framework. FMNs can be organized very efficiently in a distributed way from
local information and are ideally suited for motion planning using artificial
potentials. We compare FMNs to conventional communication topologies, and find
that FMNs offer competitive communication performance (including higher network
efficiency per edge at marginal energy cost) in addition to advantages for
mobility
FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a spectral method
The Lagrangian vortex method offers an alternative numerical approach for
direct numerical simulation of turbulence. The fact that it uses the fast
multipole method (FMM)--a hierarchical algorithm for N-body problems with
highly scalable parallel implementations--as numerical engine makes it a
potentially good candidate for exascale systems. However, there have been few
validation studies of Lagrangian vortex simulations and the insufficient
comparisons against standard DNS codes has left ample room for skepticism. This
paper presents a comparison between a Lagrangian vortex method and a
pseudo-spectral method for the simulation of decaying homogeneous isotropic
turbulence. This flow field is chosen despite the fact that it is not the most
favorable flow problem for particle methods (which shine in wake flows or where
vorticity is compact), due to the fact that it is ideal for the quantitative
validation of DNS codes. We use a 256^3 grid with Re_lambda=50 and 100 and look
at the turbulence statistics, including high-order moments. The focus is on the
effect of the various parameters in the vortex method, e.g., order of FMM
series expansion, frequency of reinitialization, overlap ratio and time step.
The vortex method uses an FMM code (exaFMM) that runs on GPU hardware using
CUDA, while the spectral code (hit3d) runs on CPU only. Results indicate that,
for this application (and with the current code implementations), the spectral
method is an order of magnitude faster than the vortex method when using a
single GPU for the FMM and six CPU cores for the FFT
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
- …