6 research outputs found

    Householder orthogonalization with a non-standard inner product

    Full text link
    Householder orthogonalization plays an important role in numerical linear algebra. It attains perfect orthogonality regardless of the conditioning of the input. However, in the context of a non-standard inner product, it becomes difficult to apply Householder orthogonalization due to the lack of an initial orthogonal basis. We propose strategies to overcome this obstacle and discuss algorithms and variants of Householder orthogonalization with a non-standard inner product. Rounding error analysis and numerical experiments demonstrate that our approach is numerically stable

    Two-Stage Block Orthogonalization to Improve Performance of ss-step GMRES

    Full text link
    On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its ss-step variant orthogonalizes a block of ss basis vectors at a time, potentially reducing the communication cost by a factor of ss. Unfortunately, for a large step size ss, the solver can generate extremely ill-conditioned basis vectors, and to maintain stability in practice, a conservatively small step size is used, which limits the performance of the ss-step solver. To enhance the performance using a small step size, in this paper, we introduce a two-stage block orthogonalization scheme. Similar to the original scheme, the first stage of the proposed method operates on a block of ss basis vectors at a time, but its objective is to maintain the well-conditioning of the generated basis vectors with a lower cost. The orthogonalization of the basis vectors is delayed until the second stage when enough basis vectors are generated to obtain higher performance. Our analysis shows the stability of the proposed two-stage scheme. The performance is improved because while the same amount of computation as the original scheme is required, most of the communication is done at the second stage of the proposed scheme, reducing the overall communication requirements. Our performance results with up to 192 NVIDIA V100 GPUs on the Summit supercomputer demonstrate that when solving a 2D Laplace problem, the two-stage approach can reduce the orthogonalization time and the total time-to-solution by the respective factors of up to 2.6×2.6\times and 1.6×1.6\times over the original ss-step GMRES, which had already obtained the respective speedups of 2.1×2.1\times and 1.8×1.8\times over the standard GMRES. Similar speedups were obtained for 3D problems and for matrices from the SuiteSparse Matrix Collection.Comment: Accepted for publication in IPDPS'2

    A robust, open-source implementation of the locally optimal block preconditioned conjugate gradient for large eigenvalue problems in quantum chemistry

    Get PDF
    We present two open-source implementations of the locally optimal block preconditioned conjugate gradient (lobpcg) algorithm to find a few eigenvalues and eigenvectors of large, possibly sparse matrices. We then test lobpcg for various quantum chemistry problems, encompassing medium to large, dense to sparse, well-behaved to ill-conditioned ones, where the standard method typically used is Davidson’s diagonalization. Numerical tests show that while Davidson’s method remains the best choice for most applications in quantum chemistry, LOBPCG represents a competitive alternative, especially when memory is an issue, and can even outperform Davidson for ill-conditioned, non-diagonally dominant problems

    SHIFTED CHOLESKY QR FOR COMPUTING THE QR FACTORIZATION OF ILL-CONDITIONED MATRICES

    No full text
    The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix X epsilon R-mxn, where m >> n. Unfortunately it is inherently unstable and often breaks down when the matrix is ill-conditioned. A recent work [Yamamoto et al., ETNA, 44, pp. 306--326 (2015)] establishes that the instability can be cured by repeating the algorithm twice (called CholeskyQR2). However, the applicability of CholeskyQR2 is still limited by the requirement that the Cholesky factorization of the Gram matrix X-inverted perpendicular X runs to completion, which means that it does not always work for matrices X with the 2-norm condition number kappa(2)(X) roughly greater than u(-1/2), where u is the unit roundoff. In this work we extend the applicability to kappa(2)(X) = O (u(-1)) by introducing a shift to the computed Gram matrix so as to guarantee the Cholesky factorization R-inverted perpendicular R = A(inverted perpendicular) A+sI succeeds numerically. We show that the computed AR(-1) has reduced condition number that is roughly bounded by u(-1/2), for which CholeskyQR2 safely computes the QR factorization, yielding a computed Q of orthogonality vertical bar vertical bar Q(inverted perpendicular) - Q I vertical bar vertical bar(2) and residual vertical bar vertical bar A - QR vertical bar vertical bar(F) / vertical bar vertical bar A vertical bar vertical bar(F) both of the order of u. Thus we obtain the required QR factorization by essentially running Cholesky QR thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR3 to reveal its excellent numerical stability. The shiftedCholeskyQR3 algorithm is also highly parallelizable, and applicable and effective also when working with an oblique inner product. We illustrate our findings through experiments, in which we achieve significant speedup over alternative methods

    An overview of block Gram-Schmidt methods and their stability properties

    Full text link
    Block Gram-Schmidt algorithms serve as essential kernels in many scientific computing applications, but for many commonly used variants, a rigorous treatment of their stability properties remains open. This survey provides a comprehensive categorization of block Gram-Schmidt algorithms, particularly those used in Krylov subspace methods to build orthonormal bases one block vector at a time. All known stability results are assembled, and new results are summarized or conjectured for important communication-reducing variants. Additionally, new block versions of low-synchronization variants are derived, and their efficacy and stability are demonstrated for a wide range of challenging examples. Low-synchronization variants appear remarkably stable for s-step-like matrices built with Newton polynomials, pointing towards a new stable and efficient backbone for Krylov subspace methods. Numerical examples are computed with a versatile MATLAB package hosted at https://github.com/katlund/BlockStab, and scripts for reproducing all results in the paper are provided. Block Gram-Schmidt implementations in popular software packages are discussed, along with a number of open problems. An appendix containing all algorithms type-set in a uniform fashion is provided.Comment: 42 pages, 5 tables, 17 figures, 20 algorithm
    corecore