Search CORE

2 research outputs found

A new parallel kernel-independent fast multipole method

Author: Biros George
Langston Harper
Ying Lexing
Zorin Denis
Publication venue: ScholarlyCommons
Publication date: 01/01/2003
Field of study

We present a new adaptive fast multipole algorithm and its parallel implementation. The algorithm is kernel-independent in the sense that the evaluation of pairwise interactions does not rely on any analytic expansions, but only utilizes kernel evaluations. The new method provides the enabling technology for many important problems in computational science and engineering. Examples include viscous flows, fracture mechanics and screened Coulombic interactions. Our MPI-based parallel implementation logically separates the computation and communication phases to avoid synchronization in the upward and downward computation passes, and thus allows us to fully exploit computation and communication overlapping. We measure isogranular and fixed-size scalability for a variety of kernels on the Pittsburgh Supercomputing Center\u27s TCS-1 Alphaserver on up to 3000 processors. We have solved viscous flow problems with up to 2.1 billion unknowns and we have achieved 1.6 Tflops/s peak performance and 1.13 Tflops/s sustained performance

Crossref

ScholarlyCommons@Penn

A Data-Parallel Implementation of O(N) Hierarchical N-Body Methods

Author: S. Lennart Johnsson
Yu Hu
Publication venue
Publication date: 01/01/1996
Field of study

The O(N) hierarchical N-body algorithms and Massively Parallel Processors allow particle systems of 100 million particles or more to be simulated in acceptable time. We present a data-parallel implementation of Anderson's method and demonstrate both efficiency and scalability of the implementation on the Connection Machine CM--5/5E systems. The communication time for large particle systems amounts to about 10--25%, and the overall efficiency is about 35%. The evaluation of the potential field of a system of 100 million particles takes 3 minutes and 15 minutes on a 256 node CM-5E, giving expected four and seven digits of accuracy, respectively. The speed of the code scales linearly with the number of processors and number of particles

CiteSeerX

Crossref