10,489 research outputs found
Multipole-to-local operator in the Fast Multipole Method: comparison of FFT, rotations and BLAS improvements
In the Fast Multipole Method, most of the far field computation is due to the multipole-to-local (M2L) operator. In this report we distinguish two different expressions for this operator: while the first one is natural and efficient, and thus commonly used, the second one, unlike the first, respects a sharp error bound, which is proven here. Two schemes, that reduce the operation count of the M2L operator, are detailed: the (block) Fast Fourier Transform and the rotations. We then present a matrix approach that uses BLAS (Basic Linear Algebra Subprograms) routines to speed up the computation. In order to use the more efficient level 3 BLAS (for matrix products), we require recopies, but this additional cost can be avoided thanks to special data storages. Finally all these schemes are compared, theorically and practically with uniform distributions, which validates our BLAS version
Pipelining the Fast Multipole Method over a Runtime System
Fast Multipole Methods (FMM) are a fundamental operation for the simulation
of many physical problems. The high performance design of such methods usually
requires to carefully tune the algorithm for both the targeted physics and the
hardware. In this paper, we propose a new approach that achieves high
performance across architectures. Our method consists of expressing the FMM
algorithm as a task flow and employing a state-of-the-art runtime system,
StarPU, in order to process the tasks on the different processing units. We
carefully design the task flow, the mathematical operators, their Central
Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as
well as scheduling schemes. We compute potentials and forces of 200 million
particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38
million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem
processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012
- …