Search CORE

611 research outputs found

PT-Scotch: A tool for efficient parallel graph ordering

Author: Amestoy
Amestoy
Barnard
C. Chevalier
F. Pellegrini
Fiduccia
George
George
Hendrickson
Hénon
Hénon
Karypis
Kernighan
Lipton
Liu
Pellegrini
Tinney
Publication venue: 'Elsevier BV'
Publication date: 01/07/2008
Field of study

The parallel ordering of large graphs is a difficult problem, because on the one hand minimum degree algorithms do not parallelize well, and on the other hand the obtainment of high quality orderings with the nested dissection algorithm requires efficient graph bipartitioning heuristics, the best sequential implementations of which are also hard to parallelize. This paper presents a set of algorithms, implemented in the PT-Scotch software package, which allows one to order large graphs in parallel, yielding orderings the quality of which is only slightly worse than the one of state-of-the-art sequential algorithms. Our implementation uses the classical nested dissection approach but relies on several novel features to solve the parallel graph bipartitioning problem. Thanks to these improvements, PT-Scotch produces consistently better orderings than ParMeTiS on large numbers of processors

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Oskar Bordeaux

A class of multilevel parallel preconditioning strategies

Author: Grigori Laura
Kumar Pawan
Nataf Frédéric
Wang Ke
Publication venue: HAL CCSD
Publication date: 06/10/2010
Field of study

In this paper, we introduce a class of recursive multilevel preconditioning strategies suited for solving large sparse linear systems of equations on modern day architectures. They are based on a reordering of the input matrix into a nested bordered block diagonal form, which allows a nested formulation of the preconditioners. The first one, which we refer to as nested SSOR (NSSOR), requires only the factorization of diagonal blocks at the innermost level of the recursive formulation. Hence, its construction is embarassingly parallel, and the memory requirements are very limited. Next two are nested versions of Modified ILU preconditioner with row sum (NMILUR) and colsum (NMILUC) property. We compare these methods in terms of iteration number, memory requirements, and overall solve time, with ILU(0) with natural ordering and nested dissection ordering, and MILU. We find that NSSOR compares favorably with ILU(0) with nested dissection ordering, while NMILUR and NMILUC outperform the other methods for certain matrices in our test set. It is proved that the NSSOR method is convergent when the input matrix is SPD. The preconditioners are designed to be suitable for parallel computing.Dans ce papier nous décrivons une classe de préconditionneurs multiniveaux parallèles pour résoudre des systèmes linéaires de grande taille. Ils se basent sur une renumérotation de la matrice d'entrée en forme block diagonale bornée et emboitée, qui permet une définition emboitée des préconditionneurs. Nous prouvons qu'un des préconditionneurs, NSSOR, converge quand la matrice d'entrée est symmétrique et définie positive. Les préconditionneurs sont adaptés au calcul parallèle

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Hierarchical interpolative factorization for elliptic operators: differential equations

Author: Ho Kenneth L.
Ying Lexing
Publication venue
Publication date: 20/04/2015
Field of study

This paper introduces the hierarchical interpolative factorization for elliptic partial differential equations (HIF-DE) in two (2D) and three dimensions (3D). This factorization takes the form of an approximate generalized LU/LDL decomposition that facilitates the efficient inversion of the discretized operator. HIF-DE is based on the multifrontal method but uses skeletonization on the separator fronts to sparsify the dense frontal matrices and thus reduce the cost. We conjecture that this strategy yields linear complexity in 2D and quasilinear complexity in 3D. Estimated linear complexity in 3D can be achieved by skeletonizing the compressed fronts themselves, which amounts geometrically to a recursive dimensional reduction scheme. Numerical experiments support our claims and further demonstrate the performance of our algorithm as a fast direct solver and preconditioner. MATLAB codes are freely available.Comment: 37 pages, 13 figures, 12 tables; to appear, Comm. Pure Appl. Math. arXiv admin note: substantial text overlap with arXiv:1307.266

arXiv.org e-Print Archive

CiteSeerX

High performance interior point methods for three-dimensional finite element limit analysis

Author: Lyamin Andrei V.
Podlich Nathan
Sloan Scott W.
Publication venue: CIMNE
Publication date: 01/01/2019
Field of study

The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes ﬁnite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional ﬁnite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional ﬁnite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, diﬀerences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and ﬁxed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modiﬁed version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×

UPCommons. Portal del coneixement obert de la UPC