8,592 research outputs found

    Inner product computation for sparse iterative solvers on\ud distributed supercomputer

    Get PDF
    Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale

    Minimizing synchronizations in sparse iterative solvers for distributed supercomputers

    Get PDF
    Eliminating synchronizations is one of the important techniques related to minimizing communications for modern high performance computing. This paper discusses principles of reducing communications due to global synchronizations in sparse iterative solvers on distributed supercomputers. We demonstrates how to minimizing global synchronizations by rescheduling a typical Krylov subspace method. The benefit of minimizing synchronizations is shown in theoretical analysis and is verified by numerical experiments using up to 900 processors. The experiments also show the communication complexity for some structured sparse matrix vector multiplications and global communications in the underlying supercomputers are in the order P1/2.5 and P4/5 respectively, where P is the number of processors and the experiments were carried on a Dawning 5000A

    Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method

    Full text link
    Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus hiding global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by two-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm

    Parallel unstructured solvers for linear partial differential equations

    Get PDF
    This thesis presents the development of a parallel algorithm to solve symmetric systems of linear equations and the computational implementation of a parallel partial differential equations solver for unstructured meshes. The proposed method, called distributive conjugate gradient - DCG, is based on a single-level domain decomposition method and the conjugate gradient method to obtain a highly scalable parallel algorithm. An overview on methods for the discretization of domains and partial differential equations is given. The partition and refinement of meshes is discussed and the formulation of the weighted residual method for two- and three-dimensions presented. Some of the methods to solve systems of linear equations are introduced, highlighting the conjugate gradient method and domain decomposition methods. A parallel unstructured PDE solver is proposed and its actual implementation presented. Emphasis is given to the data partition adopted and the scheme used for communication among adjacent subdomains is explained. A series of experiments in processor scalability is also reported. The derivation and parallelization of DCG are presented and the method validated throughout numerical experiments. The method capabilities and limitations were investigated by the solution of the Poisson equation with various source terms. The experimental results obtained using the parallel solver developed as part of this work show that the algorithm presented is accurate and highly scalable, achieving roughly linear parallel speed-up in many of the cases tested

    Parallel Iterative Solution Methods for Linear Systems arising from Discretized PDE's

    Get PDF
    In these notes we will present an overview of a number of related iterative methods for the solution of linear systems of equations. These methods are so-called Krylov projection type methods and the include popular methods as Conjugate Gradients, Bi-Conjugate Gradients, CGST Bi-CGSTAB, QMR, LSQR and GMRES. We will show how these methods can be derived from simple basic iteration formulas. We will not give convergence proofs, but we will refer for these, as far as available, to litterature. Iterative methods are often used in combination with so-called preconditioning operators (approximations for the inverses of the operator of the system to be solved). Since these preconditions are not essential in the derivation of the iterative methods, we will not give much attention to them in these notes. However, in most of the actual iteration schemes, we have included them in order to facilitate the use of these schemes in actual computations. For the application of the iterative schemes one usually thinks of linear sparse systems, e.g., like those arising in the finite element or finite difference approximatious of (systems of) partial differential equations. However, the structure of the operators plays no explicit role in any of these schemes, and these schemes might also successfully be used to solve certain large dense linear systems. Depending on the situation that might be attractive in terms of numbers of floating point operations. It will turn out that all of the iterative are parallelizable in a straight forward manner. However, especially for computers with a memory hierarchy (i.e. like cache or vector registers), and for distributed memory computers, the performance can often be improved significantly through rescheduling of the operations. We will discuss parallel implementations, and occasionally we will report on experimental findings

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

    Robust Dropping Criteria for F-norm Minimization Based Sparse Approximate Inverse Preconditioning

    Full text link
    Dropping tolerance criteria play a central role in Sparse Approximate Inverse preconditioning. Such criteria have received, however, little attention and have been treated heuristically in the following manner: If the size of an entry is below some empirically small positive quantity, then it is set to zero. The meaning of "small" is vague and has not been considered rigorously. It has not been clear how dropping tolerances affect the quality and effectiveness of a preconditioner MM. In this paper, we focus on the adaptive Power Sparse Approximate Inverse algorithm and establish a mathematical theory on robust selection criteria for dropping tolerances. Using the theory, we derive an adaptive dropping criterion that is used to drop entries of small magnitude dynamically during the setup process of MM. The proposed criterion enables us to make MM both as sparse as possible as well as to be of comparable quality to the potentially denser matrix which is obtained without dropping. As a byproduct, the theory applies to static F-norm minimization based preconditioning procedures, and a similar dropping criterion is given that can be used to sparsify a matrix after it has been computed by a static sparse approximate inverse procedure. In contrast to the adaptive procedure, dropping in the static procedure does not reduce the setup time of the matrix but makes the application of the sparser MM for Krylov iterations cheaper. Numerical experiments reported confirm the theory and illustrate the robustness and effectiveness of the dropping criteria.Comment: 27 pages, 2 figure
    • …
    corecore