Search CORE

4,302 research outputs found

Optimizing the adaptive fast multipole method for fractal sets

Author: Darve Eric
Pouransari Hadi
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 11/08/2015
Field of study

We have performed a detailed analysis of the fast multipole method (FMM) in the adaptive case, in which the depth of the FMM tree is non-uniform. Previous works in this area have focused mostly on special types of adaptive distributions, for example when points accumulate on a 2D manifold or accumulate around a few points in space. Instead, we considered a more general situation in which fractal sets, e.g., Cantor sets and generalizations, are used to create adaptive sets of points. Such sets are characterized by their dimension, a number between 0 and 3. We introduced a mathematical framework to define a converging sequence of octrees, and based on that, demonstrated how to increase

N \to \infty

. A new complexity analysis for the adaptive FMM is introduced. It is shown that the

{\cal{O}}(N)

complexity is achievable for any distribution of particles, when a modified adaptive FMM is exploited. We analyzed how the FMM performs for fractal point distributions, and how optimal parameters can be picked, e.g., the criterion used to stop the subdivision of an FMM cell. A new subdividing double-threshold method is introduced, and better performance demonstrated. Parameters in the FMM are modeled as a function of particle distribution dimension, and the optimal values are obtained. A three dimensional kernel independent black box adaptive FMM is implemented and used for all calculations

arXiv.org e-Print Archive

On well-separated sets and fast multipole methods

Author: Engblom Stefan
Publication venue: 'Elsevier BV'
Publication date: 10/08/2011
Field of study

The notion of well-separated sets is crucial in fast multipole methods as the main idea is to approximate the interaction between such sets via cluster expansions. We revisit the one-parameter multipole acceptance criterion in a general setting and derive a relative error estimate. This analysis benefits asymmetric versions of the method, where the division of the multipole boxes is more liberal than in conventional codes. Such variants offer a particularly elegant implementation with a balanced multipole tree, a feature which might be very favorable on modern computer architectures

arXiv.org e-Print Archive

Data-Driven Execution of Fast Multipole Methods

Author: Ltaief Hatem
Yokota Rio
Publication venue
Publication date: 05/03/2012
Field of study

Fast multipole methods have O(N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the tree structure is adaptive, load-balancing becomes a non-trivial question. A common strategy for load-balancing FMMs is to use the work load from the previous step as weights to statically repartition the next step. The authors discuss in the paper another approach based on data-driven execution to efficiently tackle this challenging load-balancing problem. The core idea consists of breaking the most time-consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a Directed Acyclic Graph (DAG) where nodes represent tasks, and edges represent dependencies among them. The execution of the algorithm is performed by asynchronously scheduling the tasks using the QUARK runtime environment, in a way such that data dependencies are not violated for numerical correctness purposes. This asynchronous scheduling results in an out-of-order execution. The performance results of the data-driven FMM execution outperform the previous strategy and show linear speedup on a quad-socket quad-core Intel Xeon system

arXiv.org e-Print Archive

An FMM Based on Dual Tree Traversal for Many-core Architectures

Author: Yokota Rio
Publication venue
Publication date: 19/09/2012
Field of study

The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N-body codes demonstrates that orders of magnitude increase in performance can be achieved by careful selection of the optimal algorithm and low-level optimization of the code. The current N-body solver uses a fast multipole method with an efficient strategy for finding the list of cell-cell interactions by a dual tree traversal. A task-based threading model is used to maximize thread-level parallelism and intra-node load-balancing. In order to extract the full potential of the SIMD units on the latest CPUs, the inner kernels are optimized using AVX instructions. Our code -- exaFMM -- is an order of magnitude faster than the current state-of-the-art FMM codes, which are themselves an order of magnitude faster than the average FMM code

arXiv.org e-Print Archive

FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube

Author: Biros George
Gholami Amir
Malhotra Dhairya
Sundar Hari
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 10/07/2016
Field of study

In this work, we benchmark and discuss the performance of the scalable methods for the Poisson problem which are used widely in practice: the fast Fourier transform (FFT), the fast multipole method (FMM), the geometric multigrid (GMG), and algebraic multigrid (AMG). In total we compare five different codes, three of which are developed in our group. Our FFT, GMG, and FMM are parallel solvers that use high-order approximation schemes for Poisson problems with continuous forcing functions (the source or right-hand side). We examine and report results for weak scaling, strong scaling, and time to solution for uniform and highly refined grids. We present results on the Stampede system at the Texas Advanced Computing Center and on the Titan system at the Oak Ridge National Laboratory. In our largest test case, we solved a problem with 600 billion unknowns on 229,379 cores of Titan. Overall, all methods scale quite well to these problem sizes. We have tested all of the methods with different source functions (the right-hand side in the Poisson problem). Our results indicate that FFT is the method of choice for smooth source functions that require uniform resolution. However, FFT loses its performance advantage when the source function has highly localized features like internal sharp layers. FMM and GMG considerably outperform FFT for those cases. The distinction between FMM and GMG is less pronounced and is sensitive to the quality (from a performance point of view) of the underlying implementations. The high-order accurate versions of GMG and FMM significantly outperform their low-order accurate counterparts.Comment: 25 pages; accepted paper in SISC journa

arXiv.org e-Print Archive

Dynamic autotuning of adaptive fast multipole methods on hybrid multicore CPU & GPU systems

Author: Engblom Stefan
Goude Anders
Holm Marcus
Holmgren Sverker
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 17/03/2014
Field of study

We discuss an implementation of adaptive fast multipole methods targeting hybrid multicore CPU- and GPU-systems. From previous experiences with the computational profile of our version of the fast multipole algorithm, suitable parts are off-loaded to the GPU, while the remaining parts are threaded and executed concurrently by the CPU. The parameters defining the algorithm affects the performance and by measuring this effect we are able to dynamically balance the algorithm towards optimal performance. Our setup uses the dynamic nature of the computations and is therefore of general character

arXiv.org e-Print Archive

Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank Approximation

Author: Ibeid Huda
Keyes David
Yokota Rio
Publication venue
Publication date: 06/02/2016
Field of study

There has been a large increase in the amount of work on hierarchical low-rank approximation methods, where the interest is shared by multiple communities that previously did not intersect. This objective of this article is two-fold; to provide a thorough review of the recent advancements in this field from both analytical and algebraic perspectives, and to present a comparative benchmark of two highly optimized implementations of contrasting methods for some simple yet representative test cases. We categorize the recent advances in this field from the perspective of compute-memory tradeoff, which has not been considered in much detail in this area. Benchmark tests reveal that there is a large difference in the memory consumption and performance between the different methods.Comment: 19 pages, 6 figure

arXiv.org e-Print Archive

Fast Multipole Preconditioners for Sparse Matrices Arising from Elliptic Equations

Author: Ibeid Huda
Keyes David
Pestana Jennifer
Yokota Rio
Publication venue
Publication date: 19/01/2016
Field of study

Among optimal hierarchical algorithms for the computational solution of elliptic problems, the Fast Multipole Method (FMM) stands out for its adaptability to emerging architectures, having high arithmetic intensity, tunable accuracy, and relaxable global synchronization requirements. We demonstrate that, beyond its traditional use as a solver in problems for which explicit free-space kernel representations are available, the FMM has applicability as a preconditioner in finite domain elliptic boundary value problems, by equipping it with boundary integral capability for satisfying conditions at finite boundaries and by wrapping it in a Krylov method for extensibility to more general operators. Here, we do not discuss the well developed applications of FMM to implement matrix-vector multiplications within Krylov solvers of boundary element methods. Instead, we propose using FMM for the volume-to-volume contribution of inhomogeneous Poisson-like problems, where the boundary integral is a small part of the overall computation. Our method may be used to precondition sparse matrices arising from finite difference/element discretizations, and can handle a broader range of scientific applications. Compared with multigrid methods, it is capable of comparable algebraic convergence rates down to the truncation error of the discretized PDE, and it offers potentially superior multicore and distributed memory scalability properties on commodity architecture supercomputers. Compared with other methods exploiting the low rank character of off-diagonal blocks of the dense resolvent operator, FMM-preconditioned Krylov iteration may reduce the amount of communication because it is matrix-free and exploits the tree structure of FMM. We describe our tests in reproducible detail with freely available codes and outline directions for further extensibility.Comment: 17 pages, 9 figure

arXiv.org e-Print Archive

A parallel directional Fast Multipole Method

Author: Benson Austin R.
Engquist Björn
Poulson Jack
Tran Kenneth
Ying Lexing
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 17/11/2013
Field of study

This paper introduces a parallel directional fast multipole method (FMM) for solving N-body problems with highly oscillatory kernels, with a focus on the Helmholtz kernel in three dimensions. This class of oscillatory kernels requires a more restrictive low-rank criterion than that of the low-frequency regime, and thus effective parallelizations must adapt to the modified data dependencies. We propose a simple partition at a fixed level of the octree and show that, if the partitions are properly balanced between p processes, the overall runtime is essentially O(N log N/p+ p). By the structure of the low-rank criterion, we are able to avoid communication at the top of the octree. We demonstrate the effectiveness of our parallelization on several challenging models

arXiv.org e-Print Archive

DASHMM Accelerated Adaptive Fast Multipole Poisson-Boltzmann Solver on Distributed Memory Architecture

Author: DeBuhr J.
Lu B.
Mayolo S.
Niedzielski D.
Sterling T.
Zhang B.
Publication venue
Publication date: 17/10/2017
Field of study

We present an updated version of the AFMPB package for fast calculation of molecular solvation-free energy. The main feature of the new version is the successful adoption of the DASHMM library, which enables AFMPB to operate on distributed memory computers. As a result, the new version can easily handle larger molecules or situations with higher accuracy requirements. To demonstrate the updated code, we applied the new version to a dengue virus system with more than one million atoms and a mesh with approximately 20 million triangles, and were able to reduce the time-to-solution from 10 hours reported in the previous release on a shared memory computer to less than 30 seconds on a Cray XC30 cluster using 12, 288 cores

arXiv.org e-Print Archive