24 research outputs found
A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization
We present a distributed-memory library for computations with dense
structured matrices. A matrix is considered structured if its off-diagonal
blocks can be approximated by a rank-deficient matrix with low numerical rank.
Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices
appear in many applications, e.g., finite element methods, boundary element
methods, etc. Exploiting this structure allows for fast solution of linear
systems and/or fast computation of matrix-vector products, which are the two
main building blocks of matrix computations. The compression algorithm that we
use, that computes the HSS form of an input dense matrix, relies on randomized
sampling with a novel adaptive sampling mechanism. We discuss the
parallelization of this algorithm and also present the parallelization of
structured matrix-vector product, structured factorization and solution
routines. The efficiency of the approach is demonstrated on large problems from
different academic and industrial applications, on up to 8,000 cores.
This work is part of a more global effort, the STRUMPACK (STRUctured Matrices
PACKage) software package for computations with sparse and dense structured
matrices. Hence, although useful on their own right, the routines also
represent a step in the direction of a distributed-memory sparse solver
A short note on a pipelined polarized-trace algorithm for 3D Helmholtz
We present a fast solver for the 3D high-frequency Helmholtz equation in heterogeneous, constant density, acoustic media. The solver is based on the method of polarized traces, coupled with distributed linear algebra libraries and pipelining to obtain a solver with online runtime O(max(1, R/n)N logN) where N = n[superscript 3] is the total number of degrees of freedom and R is the number of right-hand sides.TOTAL (Firm
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Training very large scale nonlinear SVMs using Alternating Direction Method of Multipliers coupled with the Hierarchically Semi-Separable kernel approximations
Typically, nonlinear Support Vector Machines (SVMs) produce significantly
higher classification quality when compared to linear ones but, at the same
time, their computational complexity is prohibitive for large-scale datasets:
this drawback is essentially related to the necessity to store and manipulate
large, dense and unstructured kernel matrices. Despite the fact that at the
core of training a SVM there is a \textit{simple} convex optimization problem,
the presence of kernel matrices is responsible for dramatic performance
reduction, making SVMs unworkably slow for large problems. Aiming to an
efficient solution of large-scale nonlinear SVM problems, we propose the use of
the \textit{Alternating Direction Method of Multipliers} coupled with
\textit{Hierarchically Semi-Separable} (HSS) kernel approximations. As shown in
this work, the detailed analysis of the interaction among their algorithmic
components unveils a particularly efficient framework and indeed, the presented
experimental results demonstrate a significant speed-up when compared to the
\textit{state-of-the-art} nonlinear SVM libraries (without significantly
affecting the classification accuracy)
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems