471 research outputs found

    A GPU-based hyperbolic SVD algorithm

    Get PDF
    A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, using a massively parallel graphics processing unit (GPU), is developed. The algorithm also serves as the final stage of solving a symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over sequential and MPI-parallelized variants of similar Jacobi-type HSVD algorithms. Finally, possibilities of hybrid CPU--GPU parallelism are discussed.Comment: Accepted for publication in BIT Numerical Mathematic

    Mixed-Precision Numerical Linear Algebra Algorithms: Integer Arithmetic Based LU Factorization and Iterative Refinement for Hermitian Eigenvalue Problem

    Get PDF
    Mixed-precision algorithms are a class of algorithms that uses low precision in part of the algorithm in order to save time and energy with less accurate computation and communication. These algorithms usually utilize iterative refinement processes to improve the approximate solution obtained from low precision to the accuracy we desire from doing all the computation in high precision. Due to the demand of deep learning applications, there are hardware developments offering different low-precision formats including half precision (FP16), Bfloat16 and integer operations for quantized integers, which uses integers with a shared scalar to represent a set of equally spaced numbers. As new hardware architectures focus on bringing performance in these formats, the mixed-precision algorithms have more potential leverage on them and outmatch traditional fixed-precision algorithms. This dissertation consists of two articles. In the first article, we adapt one of the most fundamental algorithms in numerical linear algebra---LU factorization with partial pivoting--- to use integer arithmetic. With the goal of obtaining a low accuracy factorization as the preconditioner of generalized minimal residual (GMRES) to solve systems of linear equations, the LU factorization is adapted to use two different fixed-point formats for matrices L and U. A left-looking variant is also proposed for matrices with unbounded column growth. Finally, GMRES iterative refinement has shown that it can work on matrices with condition numbers up to 10000 with the algorithm that uses int16 as input and int32 accumulator for the update step. The second article targets symmetric and Hermitian eigenvalue problems. In this section we revisit the SICE algorithm from Dongarra et al. By applying the Sherman-Morrison formula on the diagonally-shifted tridiagonal systems, we propose an updated SICE-SM algorithm. By incorporating the latest two-stage algorithms from the PLASMA and MAGMA software libraries for numerical linear algebra, we achieved up to 3.6x speedup using the mixed-precision eigensolver with the blocked SICE-SM algorithm for iterative refinement when compared with full double complex precision solvers for the cases with a portion of eigenvalues and eigenvectors requested

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

    ChASE: Chebyshev Accelerated Subspace iteration Eigensolver for sequences of Hermitian eigenvalue problems

    Full text link
    Solving dense Hermitian eigenproblems arranged in a sequence with direct solvers fails to take advantage of those spectral properties which are pertinent to the entire sequence, and not just to the single problem. When such features take the form of correlations between the eigenvectors of consecutive problems, as is the case in many real-world applications, the potential benefit of exploiting them can be substantial. We present ChASE, a modern algorithm and library based on subspace iteration with polynomial acceleration. Novel to ChASE is the computation of the spectral estimates that enter in the filter and an optimization of the polynomial degree which further reduces the necessary FLOPs. ChASE is written in C++ using the modern software engineering concepts which favor a simple integration in application codes and a straightforward portability over heterogeneous platforms. When solving sequences of Hermitian eigenproblems for a portion of their extremal spectrum, ChASE greatly benefits from the sequence's spectral properties and outperforms direct solvers in many scenarios. The library ships with two distinct parallelization schemes, supports execution over distributed GPUs, and it is easily extensible to other parallel computing architectures.Comment: 33 pages. Submitted to ACM TOM

    Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition
    corecore