9,577 research outputs found
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients
We present a robust and scalable preconditioner for the solution of
large-scale linear systems that arise from the discretization of elliptic PDEs
amenable to rank compression. The preconditioner is based on hierarchical
low-rank approximations and the cyclic reduction method. The setup and
application phases of the preconditioner achieve log-linear complexity in
memory footprint and number of operations, and numerical experiments exhibit
good weak and strong scalability at large processor counts in a distributed
memory environment. Numerical experiments with linear systems that feature
symmetry and nonsymmetry, definiteness and indefiniteness, constant and
variable coefficients demonstrate the preconditioner applicability and
robustness. Furthermore, it is possible to control the number of iterations via
the accuracy threshold of the hierarchical matrix approximations and their
arithmetic operations, and the tuning of the admissibility condition parameter.
Together, these parameters allow for optimization of the memory requirements
and performance of the preconditioner.Comment: 24 pages, Elsevier Journal of Computational and Applied Mathematics,
Dec 201
A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements
A parallel fast direct solver for rank-compressible block tridiagonal linear
systems is presented. Algorithmic synergies between Cyclic Reduction and
Hierarchical matrix arithmetic operations result in a solver with arithmetic complexity and memory footprint. We provide a
baseline for performance and applicability by comparing with well known
implementations of the -LU factorization and algebraic multigrid
with a parallel implementation that leverages the concurrency features of the
method. Numerical experiments reveal that this method is comparable with other
fast direct solvers based on Hierarchical Matrices such as -LU and
that it can tackle problems where algebraic multigrid fails to converge
A simple multigrid scheme for solving the Poisson equation with arbitrary domain boundaries
We present a new multigrid scheme for solving the Poisson equation with
Dirichlet boundary conditions on a Cartesian grid with irregular domain
boundaries. This scheme was developed in the context of the Adaptive Mesh
Refinement (AMR) schemes based on a graded-octree data structure. The Poisson
equation is solved on a level-by-level basis, using a "one-way interface"
scheme in which boundary conditions are interpolated from the previous coarser
level solution. Such a scheme is particularly well suited for self-gravitating
astrophysical flows requiring an adaptive time stepping strategy. By
constructing a multigrid hierarchy covering the active cells of each AMR level,
we have designed a memory-efficient algorithm that can benefit fully from the
multigrid acceleration. We present a simple method for capturing the boundary
conditions across the multigrid hierarchy, based on a second-order accurate
reconstruction of the boundaries of the multigrid levels. In case of very
complex boundaries, small scale features become smaller than the discretization
cell size of coarse multigrid levels and convergence problems arise. We propose
a simple solution to address these issues. Using our scheme, the convergence
rate usually depends on the grid size for complex grids, but good linear
convergence is maintained. The proposed method was successfully implemented on
distributed memory architectures in the RAMSES code, for which we present and
discuss convergence and accuracy properties as well as timing performances.Comment: 33 pages, 15 figures, accepted for publication in Journal of
Computational Physic
- …