Search CORE

443 research outputs found

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet Francois-Henry
Williams Samuel
Publication venue
Publication date: 25/02/2015
Field of study

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

DI-fusion

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

A domain decomposing parallel sparse linear system solver

Author: Amestoy
Amestoy
Amestoy
Amestoy
Barrett
Benzi
Benzi
Benzi
Berry
Chen
Dongarra
Dongarra
Gravvanis
Gravvanis
Gravvanis
Karypis
Karypis
Lawrie
Lawson
Li
Manguoglu
Manguoglu
Murat Manguoglu
Polizzi
Polizzi
Sameh
Schenk
Schenk
Schenk
Publication venue: 'Elsevier BV'
Publication date: 26/08/2011
Field of study

The solution of large sparse linear systems is often the most time-consuming part of many science and engineering applications. Computational fluid dynamics, circuit simulation, power network analysis, and material science are just a few examples of the application areas in which large sparse linear systems need to be solved effectively. In this paper we introduce a new parallel hybrid sparse linear system solver for distributed memory architectures that contains both direct and iterative components. We show that by using our solver one can alleviate the drawbacks of direct and iterative solvers, achieving better scalability than with direct solvers and more robustness than with classical preconditioned iterative solvers. Comparisons to well-known direct and iterative solvers on a parallel architecture are provided.Comment: To appear in Journal of Computational and Applied Mathematic

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet François-Henry
Publication venue
Publication date: 26/06/2015
Field of study

We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices appear in many applications, e.g., finite element methods, boundary element methods, etc. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, relies on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. This work is part of a more global effort, the STRUMPACK (STRUctured Matrices PACKage) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

DI-fusion

Improving multifrontal methods by means of block low-rank representations

Author: Amestoy Patrick
Ashcraft Cleve
Boiteau Olivier
Buttari Alfredo
L'Excellent Jean-Yves
Weisbecker Clément
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

Submitted for publication to SIAMMatrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense algebra operations. Several strategies have been proposed to exploit this property. We propose a low-rank format called Block Low-Rank (BLR), and explain how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method. We present experimental results that show how the BLR format delivers gains that are comparable to those obtained with hierarchical formats such as Hierarchical matrices (H matrices) and Hierarchically Semi-Separable (HSS matrices) but provides much greater flexibility and ease of use which are essential in the context of a general purpose, algebraic solver

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

High-performance direct solution of finite element problems on multi-core processors

Author: Guney Murat Efe
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2010
Field of study

A direct solution procedure is proposed and developed which exploits the parallelism that exists in current symmetric multiprocessing (SMP) multi-core processors. Several algorithms are proposed and developed to improve the performance of the direct solution of FE problems. A high-performance sparse direct solver is developed which allows experimentation with the newly developed and existing algorithms. The performance of the algorithms is investigated using a large set of FE problems. Furthermore, operation count estimations are developed to further assess various algorithms. An out-of-core version of the solver is developed to reduce the memory requirements for the solution. I/O is performed asynchronously without blocking the thread that makes the I/O request. Asynchronous I/O allows overlapping factorization and triangular solution computations with I/O. The performance of the developed solver is demonstrated on a large number of test problems. A problem with nearly 10 million degree of freedoms is solved on a low price desktop computer using the out-of-core version of the direct solver. Furthermore, the developed solver usually outperforms a commonly used shared memory solver.Ph.D.Committee Chair: Will, Kenneth; Committee Member: Emkin, Leroy; Committee Member: Kurc, Ozgur; Committee Member: Vuduc, Richard; Committee Member: White, Donal

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

Combinatorial problems in solving linear systems

Author: Duff Iain
Uçar Bora
Publication venue: HAL CCSD
Publication date: 12/04/2011
Field of study

42 pages, available as LIP research report RR-2009-15Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. As the core of many of today's numerical linear algebra computations consists of the solution of sparse linear system by direct or iterative methods, we survey some combinatorial problems, ideas, and algorithms relating to these computations. On the direct methods side, we discuss issues such as matrix ordering; bipartite matching and matrix scaling for better pivoting; task assignment and scheduling for parallel multifrontal solvers. On the iterative method side, we discuss preconditioning techniques including incomplete factorization preconditioners, support graph preconditioners, and algebraic multigrid. In a separate part, we discuss the block triangular form of sparse matrices

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Robust memory-aware mappings for parallel multifrontal factorizations

Author: Agullo Emmanuel
Amestoy Patrick
Buttari Alfredo
Guermouche Abdou
L'Excellent Jean-Yves
Rouet François-Henry
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

International audienceWe study the memory scalability of the parallel multifrontal factorization of sparse matrices. In particular, we are interested in controlling the active memory specific to the multifrontal factorization. We illustrate why commonly used mapping strategies (e.g., the proportional mapping) cannot provide a high memory efficiency, which means that they tend to let the memory usage of the factorization grow when the number of processes increases. We propose “memory-aware” algorithms that aim at maximizing the granularity of parallelism while respecting memory constraints. These algorithms provide accurate memory estimates prior to the factorization and can significantly enhance the robustness of a multifrontal code. We illustrate our approach with experiments performed on large matrices

HAL-ENS-LYON

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

External Memory Algorithms for Factoring Sparse Matrices

Author: Dobrian Florin
Publication venue: ODU Digital Commons
Publication date: 01/01/2001
Field of study

We consider the factorization of sparse symmetric matrices in the context of a two-layer storage system: disk/core. When the core is sufficiently large the factorization can be performed in-core. In this case we must read the input, compute, and write the output, in this sequence. On the other hand, when the core is not large enough, the factorization becomes out-of-core, which means that data movement and computation must be interleaved. We identify two major out-of-core factorization scenarios: read-once/write-once (R1/W1) and read-many/write-many (RM/WM). The former requires minimum traffic, exactly as much as the in-core factorization: reading the input and writing the output. More traffic is required for the latter. We investigate three issues: the size of the core that determines the boundary between the two out-of-core scenarios, the in-core data structure reorganizations required by the R1/W1 factorization and the traffic required by the RM/WM factorization. We use three common factorization algorithms: left-looking, right-looking and multifrontal. In the R1/W1 scenario, our results indicate that for problems with good separators, such as those coming from the discretization of partial differential equations, ordered with nested dissection, right-looking and multifrontal factorization perform slightly better than left-looking factorization. There are, however, applications for which multifrontal is a bad choice, requiring too much temporary storage. On the other hand, right-looking factorization should be avoided in the RM/WM scenario. Left-looking is a good choice, but only if data is blocked along one dimension. Multifrontal performs well for both one and two dimensional blocks as long as not too much storage is required. We also explore a framework for a software implementation. We have implemented an in-core solver that relies on some object-oriented constructs. Most of the code is written in C++, except for some kernels written in Fortran 77. We intend to add out-of-core functionality to the code and data movement is a major concern. Implicit data movement represents the easy way, but, as some of our experiments show, good performance can be achieved only with explicit data movement. This complicates the code and we expect a substantial effort in order to implement an efficient out-of-core solver

Old Dominion University