4,417 research outputs found

    Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method

    Full text link
    Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus hiding global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by two-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm

    Error estimators and their analysis for CG, Bi-CG and GMRES

    Full text link
    We present an analysis of the uncertainty in the convergence of iterative linear solvers when using relative residue as a stopping criterion, and the resulting over/under computation for a given tolerance in error. This shows that error estimation is indispensable for efficient and accurate solution of moderate to high conditioned linear systems (Îş>100\kappa>100), where Îş\kappa is the condition number of the matrix. An O(1)\mathcal{O}(1) error estimator for iterations of the CG (Conjugate Gradient) algorithm was proposed more than two decades ago. Recently, an O(k2)\mathcal{O}(k^2) error estimator was described for the GMRES (Generalized Minimal Residual) algorithm which allows for non-symmetric linear systems as well, where kk is the iteration number. We suggest a minor modification in this GMRES error estimation for increased stability. In this work, we also propose an O(n)\mathcal{O}(n) error estimator for A-norm and l2l_{2} norm of the error vector in Bi-CG (Bi-Conjugate Gradient) algorithm. The robust performance of these estimates as a stopping criterion results in increased savings and accuracy in computation, as condition number and size of problems increase

    New Algebraic Formulation of Density Functional Calculation

    Full text link
    This article addresses a fundamental problem faced by the ab initio community: the lack of an effective formalism for the rapid exploration and exchange of new methods. To rectify this, we introduce a novel, basis-set independent, matrix-based formulation of generalized density functional theories which reduces the development, implementation, and dissemination of new ab initio techniques to the derivation and transcription of a few lines of algebra. This new framework enables us to concisely demystify the inner workings of fully functional, highly efficient modern ab initio codes and to give complete instructions for the construction of such for calculations employing arbitrary basis sets. Within this framework, we also discuss in full detail a variety of leading-edge ab initio techniques, minimization algorithms, and highly efficient computational kernels for use with scalar as well as shared and distributed-memory supercomputer architectures

    A framework for deflated and augmented Krylov subspace methods

    Get PDF
    We consider deflation and augmentation techniques for accelerating the convergence of Krylov subspace methods for the solution of nonsingular linear algebraic systems. Despite some formal similarity, the two techniques are conceptually different from preconditioning. Deflation (in the sense the term is used here) "removes" certain parts from the operator making it singular, while augmentation adds a subspace to the Krylov subspace (often the one that is generated by the singular operator); in contrast, preconditioning changes the spectrum of the operator without making it singular. Deflation and augmentation have been used in a variety of methods and settings. Typically, deflation is combined with augmentation to compensate for the singularity of the operator, but both techniques can be applied separately. We introduce a framework of Krylov subspace methods that satisfy a Galerkin condition. It includes the families of orthogonal residual (OR) and minimal residual (MR) methods. We show that in this framework augmentation can be achieved either explicitly or, equivalently, implicitly by projecting the residuals appropriately and correcting the approximate solutions in a final step. We study conditions for a breakdown of the deflated methods, and we show several possibilities to avoid such breakdowns for the deflated MINRES method. Numerical experiments illustrate properties of different variants of deflated MINRES analyzed in this paper.Comment: 24 pages, 3 figure

    Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

    Get PDF
    Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm

    Accelerating Cosmic Microwave Background map-making procedure through preconditioning

    Get PDF
    Estimation of the sky signal from sequences of time ordered data is one of the key steps in Cosmic Microwave Background (CMB) data analysis, commonly referred to as the map-making problem. Some of the most popular and general methods proposed for this problem involve solving generalised least squares (GLS) equations with non-diagonal noise weights given by a block-diagonal matrix with Toeplitz blocks. In this work we study new map-making solvers potentially suitable for applications to the largest anticipated data sets. They are based on iterative conjugate gradient (CG) approaches enhanced with novel, parallel, two-level preconditioners. We apply the proposed solvers to examples of simulated non-polarised and polarised CMB observations, and a set of idealised scanning strategies with sky coverage ranging from nearly a full sky down to small sky patches. We discuss in detail their implementation for massively parallel computational platforms and their performance for a broad range of parameters characterising the simulated data sets. We find that our best new solver can outperform carefully-optimised standard solvers used today by a factor of as much as 5 in terms of the convergence rate and a factor of up to 44 in terms of the time to solution, and to do so without significantly increasing the memory consumption and the volume of inter-processor communication. The performance of the new algorithms is also found to be more stable and robust, and less dependent on specific characteristics of the analysed data set. We therefore conclude that the proposed approaches are well suited to address successfully challenges posed by new and forthcoming CMB data sets.Comment: 19 pages // Final version submitted to A&

    Accelerating Cosmic Microwave Background map-making procedure through preconditioning

    Get PDF
    Estimation of the sky signal from sequences of time ordered data is one of the key steps in Cosmic Microwave Background (CMB) data analysis, commonly referred to as the map-making problem. Some of the most popular and general methods proposed for this problem involve solving generalised least squares (GLS) equations with non-diagonal noise weights given by a block-diagonal matrix with Toeplitz blocks. In this work we study new map-making solvers potentially suitable for applications to the largest anticipated data sets. They are based on iterative conjugate gradient (CG) approaches enhanced with novel, parallel, two-level preconditioners. We apply the proposed solvers to examples of simulated non-polarised and polarised CMB observations, and a set of idealised scanning strategies with sky coverage ranging from nearly a full sky down to small sky patches. We discuss in detail their implementation for massively parallel computational platforms and their performance for a broad range of parameters characterising the simulated data sets. We find that our best new solver can outperform carefully-optimised standard solvers used today by a factor of as much as 5 in terms of the convergence rate and a factor of up to 44 in terms of the time to solution, and to do so without significantly increasing the memory consumption and the volume of inter-processor communication. The performance of the new algorithms is also found to be more stable and robust, and less dependent on specific characteristics of the analysed data set. We therefore conclude that the proposed approaches are well suited to address successfully challenges posed by new and forthcoming CMB data sets.Comment: 19 pages // Final version submitted to A&
    • …
    corecore