611 research outputs found

    PT-Scotch: A tool for efficient parallel graph ordering

    Get PDF
    The parallel ordering of large graphs is a difficult problem, because on the one hand minimum degree algorithms do not parallelize well, and on the other hand the obtainment of high quality orderings with the nested dissection algorithm requires efficient graph bipartitioning heuristics, the best sequential implementations of which are also hard to parallelize. This paper presents a set of algorithms, implemented in the PT-Scotch software package, which allows one to order large graphs in parallel, yielding orderings the quality of which is only slightly worse than the one of state-of-the-art sequential algorithms. Our implementation uses the classical nested dissection approach but relies on several novel features to solve the parallel graph bipartitioning problem. Thanks to these improvements, PT-Scotch produces consistently better orderings than ParMeTiS on large numbers of processors

    A class of multilevel parallel preconditioning strategies

    Get PDF
    In this paper, we introduce a class of recursive multilevel preconditioning strategies suited for solving large sparse linear systems of equations on modern day architectures. They are based on a reordering of the input matrix into a nested bordered block diagonal form, which allows a nested formulation of the preconditioners. The first one, which we refer to as nested SSOR (NSSOR), requires only the factorization of diagonal blocks at the innermost level of the recursive formulation. Hence, its construction is embarassingly parallel, and the memory requirements are very limited. Next two are nested versions of Modified ILU preconditioner with row sum (NMILUR) and colsum (NMILUC) property. We compare these methods in terms of iteration number, memory requirements, and overall solve time, with ILU(0) with natural ordering and nested dissection ordering, and MILU. We find that NSSOR compares favorably with ILU(0) with nested dissection ordering, while NMILUR and NMILUC outperform the other methods for certain matrices in our test set. It is proved that the NSSOR method is convergent when the input matrix is SPD. The preconditioners are designed to be suitable for parallel computing.Dans ce papier nous décrivons une classe de préconditionneurs multiniveaux parallèles pour résoudre des systèmes linéaires de grande taille. Ils se basent sur une renumérotation de la matrice d'entrée en forme block diagonale bornée et emboitée, qui permet une définition emboitée des préconditionneurs. Nous prouvons qu'un des préconditionneurs, NSSOR, converge quand la matrice d'entrée est symmétrique et définie positive. Les préconditionneurs sont adaptés au calcul parallèle

    Hierarchical interpolative factorization for elliptic operators: differential equations

    Full text link
    This paper introduces the hierarchical interpolative factorization for elliptic partial differential equations (HIF-DE) in two (2D) and three dimensions (3D). This factorization takes the form of an approximate generalized LU/LDL decomposition that facilitates the efficient inversion of the discretized operator. HIF-DE is based on the multifrontal method but uses skeletonization on the separator fronts to sparsify the dense frontal matrices and thus reduce the cost. We conjecture that this strategy yields linear complexity in 2D and quasilinear complexity in 3D. Estimated linear complexity in 3D can be achieved by skeletonizing the compressed fronts themselves, which amounts geometrically to a recursive dimensional reduction scheme. Numerical experiments support our claims and further demonstrate the performance of our algorithm as a fast direct solver and preconditioner. MATLAB codes are freely available.Comment: 37 pages, 13 figures, 12 tables; to appear, Comm. Pure Appl. Math. arXiv admin note: substantial text overlap with arXiv:1307.266

    High performance interior point methods for three-dimensional finite element limit analysis

    Get PDF
    The ability to obtain rigorous upper and lower bounds on collapse loads of various structures makes finite element limit analysis an attractive design tool. The increasingly high cost of computing those bounds, however, has limited its application on problems in three dimensions. This work reports on a high-performance homogeneous self-dual primal-dual interior point method developed for three-dimensional finite element limit analysis. This implementation achieves convergence times over 4.5× faster than the leading commercial solver across a set of three-dimensional finite element limit analysis test problems, making investigation of three dimensional limit loads viable. A comparison between a range of iterative linear solvers and direct methods used to determine the search direction is also provided, demonstrating the superiority of direct methods for this application. The components of the interior point solver considered include the elimination of and options for handling remaining free variables, multifrontal and supernodal Cholesky comparison for computing the search direction, differences between approximate minimum degree [1] and nested dissection [13] orderings, dealing with dense columns and fixed variables, and accelerating the linear system solver through parallelization. Each of these areas resulted in an improvement on at least one of the problems in the test set, with many achieving gains across the whole set. The serial implementation achieved runtime performance 1.7× faster than the commercial solver Mosek [5]. Compared with the parallel version of Mosek, the use of parallel BLAS routines in the supernodal solver saw a 1.9× speedup, and with a modified version of the GPU-enabled CHOLMOD [11] and a single NVIDIA Tesla K20c this speedup increased to 4.65×
    corecore