770 research outputs found

    Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

    Full text link
    We discuss an approach for solving sparse or dense banded linear systems Ax=b{\bf A} {\bf x} = {\bf b} on a Graphics Processing Unit (GPU) card. The matrix A∈RN×N{\bf A} \in {\mathbb{R}}^{N \times N} is possibly nonsymmetric and moderately large; i.e., 10000≤N≤50000010000 \leq N \leq 500000. The ${\it split\ and\ parallelize}( ({\tt SaP})approachseekstopartitionthematrix) approach seeks to partition the matrix {\bf A}intodiagonalsub−blocks into diagonal sub-blocks {\bf A}_i,, i=1,\ldots,P,whichareindependentlyfactoredinparallel.Thesolutionmaychoosetoconsiderortoignorethematricesthatcouplethediagonalsub−blocks, which are independently factored in parallel. The solution may choose to consider or to ignore the matrices that couple the diagonal sub-blocks {\bf A}_i.Thisapproach,alongwiththeKrylovsubspace−basediterativemethodthatitpreconditions,areimplementedinasolvercalled. This approach, along with the Krylov subspace-based iterative method that it preconditions, are implemented in a solver called {\tt SaP::GPU},whichiscomparedintermsofefficiencywiththreecommonlyusedsparsedirectsolvers:, which is compared in terms of efficiency with three commonly used sparse direct solvers: {\tt PARDISO},, {\tt SuperLU},and, and {\tt MUMPS}.. {\tt SaP::GPU},whichrunsentirelyontheGPUexceptseveralstagesinvolvedinpreliminaryrow−columnpermutations,isrobustandcompareswellintermsofefficiencywiththeaforementioneddirectsolvers.InacomparisonagainstIntel′s, which runs entirely on the GPU except several stages involved in preliminary row-column permutations, is robust and compares well in terms of efficiency with the aforementioned direct solvers. In a comparison against Intel's {\tt MKL},, {\tt SaP::GPU}alsofareswellwhenusedtosolvedensebandedsystemsthatareclosetobeingdiagonallydominant. also fares well when used to solve dense banded systems that are close to being diagonally dominant. {\tt SaP::GPU}$ is publicly available and distributed as open source under a permissive BSD3 license.Comment: 38 page

    Dendrogram seriation in data visualisation: algorithms and applications

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. The purpose of this thesis is to investigate and develop tools for seriation with the goal of using these tools to enhance data visualisation. The particular focus of this thesis is on dendrogram seriation algorithms. A dendrogram is a tree-like structure used for visualising the results of a hierarchical clustering and the order of the leaves in a dendrogram provides a permutation of a set of objects. Dendrogram seriation algorithms rearrange the leaves of a dendrogram in order to find a permutation that optimises a given criterion. Dendrogram seriation algorithms are widely used, however, the research in this area is often confusing because of inconsistent or inadequate terminology. This thesis proposes new notation and terminology with the goal of better understanding and comparing dendrogram seriation algorithms. Seriation criteria measure the goodness of a permutation of a set of objects. Popular seriation criteria include the path length of a permutation and measuring anti-Robinson form in a symmetric matrix. This thesis proposes two new seriation criteria, lazy path length and banded anti-Robinson form, and demonstrates their effectiveness in improving a variety of visualisations. The main contribution of this thesis is a new dendrogram seriation algorithm. This algorithm improves on other dendrogram seriation algorithms and is also flexible because it allows the user to either choose from a variety of seriation criteria, including the new criteria mentioned above, or to input their own criteria. Finally, this thesis performs a comparison of several seriation algorithms, the results of which show that the proposed algorithm performs competitively against other algorithms. This leads to a set of general guidelines for choosing the most appropriate seriation algorithm for different seriation interests and visualisation settings

    A New Flexible Dendrogram Seriation Algorithm for Data Visualisation

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. Seriating variables, cases or categories generally improves visualisations of statistical data, for example, by revealing hidden patterns in data or by making large datasets easier to understand. In this paper we present a new algorithm for seriation based on dendrograms. Dendrogram seriation algorithms rearrange the nodes in a dendrogram in order to obtain a permutation of the leaves (i.e. objects) that optimises a given criterion. Our algorithm is more flexible than currently available seriation algorithms because it allows the user to either choose from a variety of seriation criteria or to input their own criteria. This choice of seriation criteria is an important feature because different criteria are suitable for different visualisation settings. Common seriation criteria include measurements of the path length through a set of objects and measurements of anti-Robinson form in a symmetric matrix. We propose new seriation criteria called lazy path length and banded anti-Robinson form, and demonstrate their effectiveness in a variety of visualisation settings

    A New Flexible Dendrogram Seriation Algorithm for Data Visualisation

    Get PDF
    Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. Seriating variables, cases or categories generally improves visualisations of statistical data, for example, by revealing hidden patterns in data or by making large datasets easier to understand. In this paper we present a new algorithm for seriation based on dendrograms. Dendrogram seriation algorithms rearrange the nodes in a dendrogram in order to obtain a permutation of the leaves (i.e. objects) that optimises a given criterion. Our algorithm is more flexible than currently available seriation algorithms because it allows the user to either choose from a variety of seriation criteria or to input their own criteria. This choice of seriation criteria is an important feature because different criteria are suitable for different visualisation settings. Common seriation criteria include measurements of the path length through a set of objects and measurements of anti-Robinson form in a symmetric matrix. We propose new seriation criteria called lazy path length and banded anti-Robinson form, and demonstrate their effectiveness in a variety of visualisation settings

    Communication aspects of parallel processing

    Get PDF
    Cover title.Includes bibliographical references.Supported in part by the Air Force Office of Scientific Research. AFOSR-88-0032Cüneyt Özveren

    Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs

    Get PDF
    This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided

    Decay properties of spectral projectors with applications to electronic structure

    Full text link
    Motivated by applications in quantum chemistry and solid state physics, we apply general results from approximation theory and matrix analysis to the study of the decay properties of spectral projectors associated with large and sparse Hermitian matrices. Our theory leads to a rigorous proof of the exponential off-diagonal decay ("nearsightedness") for the density matrix of gapped systems at zero electronic temperature in both orthogonal and non-orthogonal representations, thus providing a firm theoretical basis for the possibility of linear scaling methods in electronic structure calculations for non-metallic systems. We further discuss the case of density matrices for metallic systems at positive electronic temperature. A few other possible applications are also discussed.Comment: 63 pages, 13 figure

    Sparse Matrix Multiplication on a Field-Programmable Gate Array

    Get PDF
    To extract data from highly sophisticated sensor networks, algorithms derived from graph theory are often applied to raw sensor data. Embedded digital systems are used to apply these algorithms. A common computation performed in these algorithms is finding the product of two sparsely populated matrices. When processing a sparse matrix, certain optimizations can be made by taking advantage of the large percentage of zero entries. This project proposes an optimized algorithm for performing sparse matrix multiplications in an embedded system and investigates how a parallel architecture constructed of multiple processors on a single Field-Programmable Gate Array (FPGA) can be used to speed up computations
    • …
    corecore