272 research outputs found

    Fast Multipole Method as a Matrix-Free Hierarchical Low-Rank Approximation

    Full text link
    There has been a large increase in the amount of work on hierarchical low-rank approximation methods, where the interest is shared by multiple communities that previously did not intersect. This objective of this article is two-fold; to provide a thorough review of the recent advancements in this field from both analytical and algebraic perspectives, and to present a comparative benchmark of two highly optimized implementations of contrasting methods for some simple yet representative test cases. We categorize the recent advances in this field from the perspective of compute-memory tradeoff, which has not been considered in much detail in this area. Benchmark tests reveal that there is a large difference in the memory consumption and performance between the different methods.Comment: 19 pages, 6 figure

    FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube

    Full text link
    In this work, we benchmark and discuss the performance of the scalable methods for the Poisson problem which are used widely in practice: the fast Fourier transform (FFT), the fast multipole method (FMM), the geometric multigrid (GMG), and algebraic multigrid (AMG). In total we compare five different codes, three of which are developed in our group. Our FFT, GMG, and FMM are parallel solvers that use high-order approximation schemes for Poisson problems with continuous forcing functions (the source or right-hand side). We examine and report results for weak scaling, strong scaling, and time to solution for uniform and highly refined grids. We present results on the Stampede system at the Texas Advanced Computing Center and on the Titan system at the Oak Ridge National Laboratory. In our largest test case, we solved a problem with 600 billion unknowns on 229,379 cores of Titan. Overall, all methods scale quite well to these problem sizes. We have tested all of the methods with different source functions (the right-hand side in the Poisson problem). Our results indicate that FFT is the method of choice for smooth source functions that require uniform resolution. However, FFT loses its performance advantage when the source function has highly localized features like internal sharp layers. FMM and GMG considerably outperform FFT for those cases. The distinction between FMM and GMG is less pronounced and is sensitive to the quality (from a performance point of view) of the underlying implementations. The high-order accurate versions of GMG and FMM significantly outperform their low-order accurate counterparts.Comment: 25 pages; accepted paper in SISC journa

    Fast Multipole Preconditioners for Sparse Matrices Arising from Elliptic Equations

    Full text link
    Among optimal hierarchical algorithms for the computational solution of elliptic problems, the Fast Multipole Method (FMM) stands out for its adaptability to emerging architectures, having high arithmetic intensity, tunable accuracy, and relaxable global synchronization requirements. We demonstrate that, beyond its traditional use as a solver in problems for which explicit free-space kernel representations are available, the FMM has applicability as a preconditioner in finite domain elliptic boundary value problems, by equipping it with boundary integral capability for satisfying conditions at finite boundaries and by wrapping it in a Krylov method for extensibility to more general operators. Here, we do not discuss the well developed applications of FMM to implement matrix-vector multiplications within Krylov solvers of boundary element methods. Instead, we propose using FMM for the volume-to-volume contribution of inhomogeneous Poisson-like problems, where the boundary integral is a small part of the overall computation. Our method may be used to precondition sparse matrices arising from finite difference/element discretizations, and can handle a broader range of scientific applications. Compared with multigrid methods, it is capable of comparable algebraic convergence rates down to the truncation error of the discretized PDE, and it offers potentially superior multicore and distributed memory scalability properties on commodity architecture supercomputers. Compared with other methods exploiting the low rank character of off-diagonal blocks of the dense resolvent operator, FMM-preconditioned Krylov iteration may reduce the amount of communication because it is matrix-free and exploits the tree structure of FMM. We describe our tests in reproducible detail with freely available codes and outline directions for further extensibility.Comment: 17 pages, 9 figure

    A Finite Element Based P3M Method for N-body Problems

    Full text link
    We introduce a fast mesh-based method for computing N-body interactions that is both scalable and accurate. The method is founded on a particle-particle--particle-mesh P3M approach, which decomposes a potential into rapidly decaying short-range interactions and smooth, mesh-resolvable long-range interactions. However, in contrast to the traditional approach of using Gaussian screen functions to accomplish this decomposition, our method employs specially designed polynomial bases to construct the screened potentials. Because of this form of the screen, the long-range component of the potential is then solved exactly with a finite element method, leading ultimately to a sparse matrix problem that is solved efficiently with standard multigrid methods. Moreover, since this system represents an exact discretization, the optimal resolution properties of the FFT are unnecessary, though the short-range calculation is now more involved than P3M/PME methods. We introduce the method, analyze its key properties, and demonstrate the accuracy of the algorithm.Comment: 20 pages, submitted to SIS

    Learning with Analytical Models

    Full text link
    To understand and predict the performance of scientific applications, several analytical and machine learning approaches have been proposed, each having its advantages and disadvantages. In this paper, we propose and validate a hybrid approach for performance modeling and prediction, which combines analytical and machine learning models. The proposed hybrid model aims to minimize prediction cost while providing reasonable prediction accuracy. Our validation results show that the hybrid model is able to learn and correct the analytical models to better match the actual performance. Furthermore, the proposed hybrid model improves the prediction accuracy in comparison to pure machine learning techniques while using small training datasets, thus making it suitable for hardware and workload changes

    Optimal, scalable forward models for computing gravity anomalies

    Full text link
    We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM). The methods utilizing the PDE formulation described here differ from previously published approaches used in gravity modeling in that they are optimal, implying that both the memory and computational time required scale linearly with respect to the number of unknowns in the potential field. Additionally, all of the implementations presented here are developed such that the computations can be performed in a massively parallel, distributed memory computing environment. Through numerical experiments, we compare the methods on the basis of their discretization error, CPU time and parallel scalability. We demonstrate the parallel scalability of all these techniques by running forward models with up to 10810^8 voxels on 1000's of cores.Comment: 38 pages, 13 figures; accepted by Geophysical Journal Internationa

    Flexibly imposing periodicity in kernel independent FMM: A Multipole-To-Local operator approach

    Full text link
    An important but missing component in the application of the kernel independent fast multipole method (KIFMM) is the capability for flexibly and efficiently imposing singly, doubly, and triply periodic boundary conditions. In most popular packages such periodicities are imposed with the hierarchical repetition of periodic boxes, which may give an incorrect answer due to the conditional convergence of some kernel sums. Here we present an efficient method to properly impose periodic boundary conditions using a near-far splitting scheme. The near-field contribution is directly calculated with the KIFMM method, while the far-field contribution is calculated with a multipole-to-local (M2L) operator which is independent of the source and target point distribution. The M2L operator is constructed with the far-field portion of the kernel function to generate the far-field contribution with the downward equivalent source points in KIFMM. This method guarantees the sum of the near-field \& far-field converge pointwise to results satisfying periodicity and compatibility conditions. The computational cost of the far-field calculation observes the same O(N)\mathcal{O}(N) complexity as FMM and is designed to be small by reusing the data computed by KIFMM for the near-field. The far-field calculations require no additional control parameters, and observes the same theoretical error bound as KIFMM. We present accuracy and timing test results for the Laplace kernel in singly periodic domains and the Stokes velocity kernel in doubly and triply periodic domains

    BAGEL: Brilliantly Advanced General Electronic-structure Library

    Full text link
    On behalf of the development team, I review the capabilities of the BAGEL program package in this article. BAGEL is a newly-developed full-fledged program package for electronic-structure computation in quantum chemistry, which is released under the GNU General Public License with many contributions from the developers. The unique features include analytical CASPT2 nuclear energy gradients and derivative couplings, relativistic multireference wave functions based on the Dirac equation, and implementations of novel electronic structure theories. All of the programs are efficiently parallelized using both threads and MPI processes. We also discuss the code generator SMITH3, which has been used to implement some of the programs in BAGEL. The developers' contributions are listed at the end of the main text.Comment: Software Focus article, WIREs: Computational Molecular Scienc

    Hierarchical Matrix Operations on GPUs: Matrix-Vector Multiplication and Compression

    Full text link
    Hierarchical matrices are space and time efficient representations of dense matrices that exploit the low rank structure of matrix blocks at different levels of granularity. The hierarchically low rank block partitioning produces representations that can be stored and operated on in near-linear complexity instead of the usual polynomial complexity of dense matrices. In this paper, we present high performance implementations of matrix vector multiplication and compression operations for the H2\mathcal{H}^2 variant of hierarchical matrices on GPUs. This variant exploits, in addition to the hierarchical block partitioning, hierarchical bases for the block representations and results in a scheme that requires only O(n)O(n) storage and O(n)O(n) complexity for the mat-vec and compression kernels. These two operations are at the core of algebraic operations for hierarchical matrices, the mat-vec being a ubiquitous operation in numerical algorithms while compression/recompression represents a key building block for other algebraic operations, which require periodic recompression during execution. The difficulties in developing efficient GPU algorithms come primarily from the irregular tree data structures that underlie the hierarchical representations, and the key to performance is to recast the computations on flattened trees in ways that allow batched linear algebra operations to be performed. This requires marshaling the irregularly laid out data in a way that allows them to be used by the batched routines. Marshaling operations only involve pointer arithmetic with no data movement and as a result have minimal overhead. Our numerical results on covariance matrices from 2D and 3D problems from spatial statistics show the high efficiency our routines achieve---over 550GB/s for the bandwidth-limited mat-vec and over 850GFLOPS/s in sustained performance for the compression on the P100 Pascal GPU

    A Study of Three Dimensional Edge and Corner Problems using the neBEM Solver

    Full text link
    The previously reported neBEM solver has been used to solve electrostatic problems having three-dimensional edges and corners in the physical domain. Both rectangular and triangular elements have been used to discretize the geometries under study. In order to maintain very high level of precision, a library of C functions yielding exact values of potential and flux influences due to uniform surface distribution of singularities on flat triangular and rectangular elements has been developed and used. Here we present the exact expressions proposed for computing the influence of uniform singularity distributions on triangular elements and illustrate their accuracy. We then consider several problems of electrostatics containing edges and singularities of various orders including plates and cubes, and L-shaped conductors. We have tried to show that using the approach proposed in the earlier paper on neBEM and its present enhanced (through the inclusion of triangular elements) form, it is possible to obtain accurate estimates of integral features such as the capacitance of a given conductor and detailed ones such as the charge density distribution at the edges / corners without taking resort to any new or special formulation. Results obtained using neBEM have been compared extensively with both existing analytical and numerical results. The comparisons illustrate the accuracy, flexibility and robustness of the new approach quite comprehensively.Comment: Submitted to Elsevie
    • …
    corecore