62,070 research outputs found

    Applications and accuracy of the parallel diagonal dominant algorithm

    Get PDF
    The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines

    A communication-less parallel algorithm for tridiagonal Toeplitz systems

    Get PDF
    AbstractDiagonally dominant tridiagonal Toeplitz systems of linear equations arise in many application areas and have been well studied in the past. Modern interest in numerical linear algebra is often focusing on solving classic problems in parallel. In McNally [Fast parallel algorithms for tri-diagonal symmetric Toeplitz systems, MCS Thesis, University of New Brunswick, Saint John, 1999], an m processor Split & Correct algorithm was presented for approximating the solution to a symmetric tridiagonal Toeplitz linear system of equations. Nemani [Perturbation methods for circulant-banded systems and their parallel implementation, Ph.D. Thesis, University of New Brunswick, Saint John, 2001] and McNally (2003) adapted the works of Rojo [A new method for solving symmetric circulant tri-diagonal system of linear equations, Comput. Math. Appl. 20 (1990) 61–67], Yan and Chung [A fast algorithm for solving special tri-diagonal systems, Computing 52 (1994) 203–211] and McNally et al. [A split-correct parallel algorithm for solving tri-diagonal symmetric Toeplitz systems, Internat. J. Comput. Math. 75 (2000) 303–313] to the non-symmetric case. In this paper we present relevant background from these methods and then introduce an m processor scalable communication-less approximation algorithm for solving a diagonally dominant tridiagonal Toeplitz system of linear equations

    Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Get PDF
    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper

    An Efficient Parallel Solver for SDD Linear Systems

    Full text link
    We present the first parallel algorithm for solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices that runs in polylogarithmic time and nearly-linear work. The heart of our algorithm is a construction of a sparse approximate inverse chain for the input matrix: a sequence of sparse matrices whose product approximates its inverse. Whereas other fast algorithms for solving systems of equations in SDD matrices exploit low-stretch spanning trees, our algorithm only requires spectral graph sparsifiers

    An Efficient Parallel Algorithm for Spectral Sparsification of Laplacian and SDDM Matrix Polynomials

    Full text link
    For "large" class C\mathcal{C} of continuous probability density functions (p.d.f.), we demonstrate that for every wCw\in\mathcal{C} there is mixture of discrete Binomial distributions (MDBD) with TNϕw/δT\geq N\sqrt{\phi_{w}/\delta} distinct Binomial distributions B(,N)B(\cdot,N) that δ\delta-approximates a discretized p.d.f. w^(i/N)w(i/N)/[=0Nw(/N)]\widehat{w}(i/N)\triangleq w(i/N)/[\sum_{\ell=0}^{N}w(\ell/N)] for all i[3:N3]i\in[3:N-3], where ϕwmaxx[0,1]w(x)\phi_{w}\geq\max_{x\in[0,1]}|w(x)|. Also, we give two efficient parallel algorithms to find such MDBD. Moreover, we propose a sequential algorithm that on input MDBD with N=2kN=2^k for kN+k\in\mathbb{N}_{+} that induces a discretized p.d.f. β\beta, B=DMB=D-M that is either Laplacian or SDDM matrix and parameter ϵ(0,1)\epsilon\in(0,1), outputs in O^(ϵ2m+ϵ4nT)\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT) time a spectral sparsifier DM^NϵDDi=0Nβi(D1M)iD-\widehat{M}_{N} \approx_{\epsilon} D-D\sum_{i=0}^{N}\beta_{i}(D^{-1} M)^i of a matrix-polynomial, where O^()\widehat{O}(\cdot) notation hides poly(logn,logN)\mathrm{poly}(\log n,\log N) factors. This improves the Cheng et al.'s [CCLPT15] algorithm whose run time is O^(ϵ2mN2+NT)\widehat{O}(\epsilon^{-2} m N^2 + NT). Furthermore, our algorithm is parallelizable and runs in work O^(ϵ2m+ϵ4nT)\widehat{O}(\epsilon^{-2}m + \epsilon^{-4}nT) and depth O(logNpoly(logn)+logT)O(\log N\cdot\mathrm{poly}(\log n)+\log T). Our main algorithmic contribution is to propose the first efficient parallel algorithm that on input continuous p.d.f. wCw\in\mathcal{C}, matrix B=DMB=D-M as above, outputs a spectral sparsifier of matrix-polynomial whose coefficients approximate component-wise the discretized p.d.f. w^\widehat{w}. Our results yield the first efficient and parallel algorithm that runs in nearly linear work and poly-logarithmic depth and analyzes the long term behaviour of Markov chains in non-trivial settings. In addition, we strengthen the Spielman and Peng's [PS14] parallel SDD solver

    Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

    Full text link
    We discuss an approach for solving sparse or dense banded linear systems Ax=b{\bf A} {\bf x} = {\bf b} on a Graphics Processing Unit (GPU) card. The matrix ARN×N{\bf A} \in {\mathbb{R}}^{N \times N} is possibly nonsymmetric and moderately large; i.e., 10000N50000010000 \leq N \leq 500000. The ${\it split\ and\ parallelize}( ({\tt SaP})approachseekstopartitionthematrix) approach seeks to partition the matrix {\bf A}intodiagonalsubblocks into diagonal sub-blocks {\bf A}_i,, i=1,\ldots,P,whichareindependentlyfactoredinparallel.Thesolutionmaychoosetoconsiderortoignorethematricesthatcouplethediagonalsubblocks, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks {\bf A}_i.Thisapproach,alongwiththeKrylovsubspacebasediterativemethodthatitpreconditions,areimplementedinasolvercalled. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called {\tt SaP::GPU},whichiscomparedintermsofefficiencywiththreecommonlyusedsparsedirectsolvers:, which is compared in terms of efficiency with three commonly used sparse direct solvers: {\tt PARDISO},, {\tt SuperLU},and, and {\tt MUMPS}.. {\tt SaP::GPU},whichrunsentirelyontheGPUexceptseveralstagesinvolvedinpreliminaryrowcolumnpermutations,isrobustandcompareswellintermsofefficiencywiththeaforementioneddirectsolvers.InacomparisonagainstIntels, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's {\tt MKL},, {\tt SaP::GPU}alsofareswellwhenusedtosolvedensebandedsystemsthatareclosetobeingdiagonallydominant. also fares well when used to solve dense banded systems that are close to being diagonally dominant. {\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.Comment: 38 page
    corecore