18 research outputs found
Pipelining the Fast Multipole Method over a Runtime System
Fast Multipole Methods (FMM) are a fundamental operation for the simulation
of many physical problems. The high performance design of such methods usually
requires to carefully tune the algorithm for both the targeted physics and the
hardware. In this paper, we propose a new approach that achieves high
performance across architectures. Our method consists of expressing the FMM
algorithm as a task flow and employing a state-of-the-art runtime system,
StarPU, in order to process the tasks on the different processing units. We
carefully design the task flow, the mathematical operators, their Central
Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as
well as scheduling schemes. We compute potentials and forces of 200 million
particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38
million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem
processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012
Complexity Analysis of a Fast Directional Matrix-Vector Multiplication
We consider a fast, data-sparse directional method to realize matrix-vector
products related to point evaluations of the Helmholtz kernel. The method is
based on a hierarchical partitioning of the point sets and the matrix. The
considered directional multi-level approximation of the Helmholtz kernel can be
applied even on high-frequency levels efficiently. We provide a detailed
analysis of the almost linear asymptotic complexity of the presented method.
Our numerical experiments are in good agreement with the provided theory.Comment: 20 pages, 2 figures, 1 tabl
Theory and implementation of -matrix based iterative and direct solvers for Helmholtz and elastodynamic oscillatory kernels
In this work, we study the accuracy and efficiency of hierarchical matrix
(-matrix) based fast methods for solving dense linear systems
arising from the discretization of the 3D elastodynamic Green's tensors. It is
well known in the literature that standard -matrix based methods,
although very efficient tools for asymptotically smooth kernels, are not
optimal for oscillatory kernels. -matrix and directional
approaches have been proposed to overcome this problem. However the
implementation of such methods is much more involved than the standard
-matrix representation. The central questions we address are
twofold. (i) What is the frequency-range in which the -matrix
format is an efficient representation for 3D elastodynamic problems? (ii) What
can be expected of such an approach to model problems in mechanical
engineering? We show that even though the method is not optimal (in the sense
that more involved representations can lead to faster algorithms) an efficient
solver can be easily developed. The capabilities of the method are illustrated
on numerical examples using the Boundary Element Method
Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method
A fast multipole method (FMM) for asymptotically smooth kernel functions (1/r, 1/r^4, Gauss and Stokes kernels, radial basis functions, etc.) based on a Chebyshev interpolation scheme has been introduced in [Fong et al., 2009]. The method has been extended to oscillatory kernels (e.g., Helmholtz kernel) in [Messner et al., 2012]. Beside its generality this FMM turns out to be favorable due to its easy implementation and its high performance based on intensive use of highly optimized BLAS libraries. However, one of its bottlenecks is the precomputation of the multiple-to-local (M2L) operator, and its higher number of floating point operations (flops) compared to other FMM formulations. Here, we present several optimizations for that operator, which is known to be the costliest FMM operator. The most efficient ones do not only reduce the precomputation time by a factor up to 340 but they also speed up the matrix-vector product. We conclude with comparisons and numerical validations of all presented optimizations