10,489 research outputs found

    Multipole-to-local operator in the Fast Multipole Method: comparison of FFT, rotations and BLAS improvements

    Get PDF
    In the Fast Multipole Method, most of the far field computation is due to the multipole-to-local (M2L) operator. In this report we distinguish two different expressions for this operator: while the first one is natural and efficient, and thus commonly used, the second one, unlike the first, respects a sharp error bound, which is proven here. Two schemes, that reduce the operation count of the M2L operator, are detailed: the (block) Fast Fourier Transform and the rotations. We then present a matrix approach that uses BLAS (Basic Linear Algebra Subprograms) routines to speed up the M2LM2L computation. In order to use the more efficient level 3 BLAS (for matrix products), we require recopies, but this additional cost can be avoided thanks to special data storages. Finally all these schemes are compared, theorically and practically with uniform distributions, which validates our BLAS version

    Pipelining the Fast Multipole Method over a Runtime System

    Get PDF
    Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012
    • …
    corecore