Search CORE

6 research outputs found

Minimizing synchronizations in sparse iterative solvers for distributed supercomputers

Author: Gu T.-X.
Liu X.-P.
Zhu S.-X.
Publication venue
Publication date: 01/01/2013
Field of study

Eliminating synchronizations is one of the important techniques related to minimizing communications for modern high performance computing. This paper discusses principles of reducing communications due to global synchronizations in sparse iterative solvers on distributed supercomputers. We demonstrates how to minimizing global synchronizations by rescheduling a typical Krylov subspace method. The benefit of minimizing synchronizations is shown in theoretical analysis and is verified by numerical experiments using up to 900 processors. The experiments also show the communication complexity for some structured sparse matrix vector multiplications and global communications in the underlying supercomputers are in the order P1/2.5 and P4/5 respectively, where P is the number of processors and the experiments were carried on a Dawning 5000A

Oxford University Research Archive

Inner product computation for sparse iterative solvers on\ud distributed supercomputer

Author: Gu T. -X.
Liu X. -P.
Zhu S. -X.
Publication venue
Publication date: 01/01/2012
Field of study

Recent years have witnessed that iterative Krylov methods without re-designing are not suitable for distribute supercomputers because of intensive global communications. It is well accepted that re-engineering Krylov methods for prescribed computer architecture is necessary and important to achieve higher performance and scalability. The paper focuses on simple and practical ways to re-organize Krylov methods and improve their performance for current heterogeneous distributed supercomputers. In construct with most of current software development of Krylov methods which usually focuses on efficient matrix vector multiplications, the paper focuses on the way to compute inner products on supercomputers and explains why inner product computation on current heterogeneous distributed supercomputers is crucial for scalable Krylov methods. Communication complexity analysis shows that how the inner product computation can be the bottleneck of performance of (inner) product-type iterative solvers on distributed supercomputers due to global communications. Principles of reducing such global communications are discussed. The importance of minimizing communications is demonstrated by experiments using up to 900 processors. The experiments were carried on a Dawning 5000A, one of the fastest and earliest heterogeneous supercomputers in the world. Both the analysis and experiments indicates that inner product computation is very likely to be the most challenging kernel for inner product-based iterative solvers to achieve exascale

Oxford University Research Archive

The Improved Quasi-Minimal Residual Method on Massively Distributed Memory Computers

Author: Hai-xiang Lin
Tian-ruo Yang
Publication venue
Publication date: 01/01/1997
Field of study

. For the solutions of linear systems of equations with unsymmetric coefficient matrices, we propose an improved version of the quasi-minimal residual (IQMR) method by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors normalized to unit length. The algorithm is derived in such a way that all inner products and matrix-vector multiplications of a single iteration step are independent, subsequently communication time required for inner products can be overlapped efficiently with computation time. Therefore, the cost of global communication on parallel distributed memory computers is significantly reduced. The resulting IQMR algorithm preserves the favorable properties of the Lanczos process without increasing computational costs. The efficiency of this method is demonstrated by numerical experimental r..

CiteSeerX

Efficient implementation of the improved quasi-minimal residual method on massively distributed memory computers

Author: A. T. Ogielski
B. Hendrickson
E. Sturler de
H. M. Bucker
H. M. Bucker
H. M. Bucker
J. J. Dongarra
R. W. Freund
R. W. Freund
R. W. Freund
V. Kumar
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref