Search CORE

66,443 research outputs found

Deflated GMRES for Systems with Multiple Shifts and Multiple Right-Hand Sides

Author: Baglama
Burrage
Chan
Chapman
Darnell
Datta
de Forcrand
De Sturler
Dean Darnell
Dong
Edwards
Erhel
Feriani
Freund
Freund
Freund
Frommer
Frommer
Gu
Kharchenko
Le Calvez
Morgan
Morgan
Morgan
Morgan
Morgan
Morgan
Morgan
Morgan
Morgan
Narayanan
Neff
Paige
Parks
Ronald B. Morgan
Saad
Saad
Saad
Simoncini
Simoncini
Simoncini
Sorensen
Stewart
Walter Wilcox
Wu
Young
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

We consider solution of multiply shifted systems of nonsymmetric linear equations, possibly also with multiple right-hand sides. First, for a single right-hand side, the matrix is shifted by several multiples of the identity. Such problems arise in a number of applications, including lattice quantum chromodynamics where the matrices are complex and non-Hermitian. Some Krylov iterative methods such as GMRES and BiCGStab have been used to solve multiply shifted systems for about the cost of solving just one system. Restarted GMRES can be improved by deflating eigenvalues for matrices that have a few small eigenvalues. We show that a particular deflated method, GMRES-DR, can be applied to multiply shifted systems. In quantum chromodynamics, it is common to have multiple right-hand sides with multiple shifts for each right-hand side. We develop a method that efficiently solves the multiple right-hand sides by using a deflated version of GMRES and yet keeps costs for all of the multiply shifted systems close to those for one shift. An example is given showing this can be extremely effective with a quantum chromodynamics matrix.Comment: 19 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method

Author: Cools Siegfried
Cornelis Jeffrey
Vanroose Wim
Publication venue
Publication date: 01/01/2019
Field of study

Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus hiding global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by two-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

Author: Agullo Emmanuel
Cools Siegfried
Giraud Luc
Vanroose Wim
Yetkin Emrullah Fatih
Publication venue
Publication date: 29/11/2017
Field of study

Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Institutional Repository Universiteit Antwerpen

Parallel scheduling of recursively defined arrays

Author: Gokhale M. B.
Myers T. J.
Publication venue
Publication date
Field of study

A new method of automatic generation of concurrent programs which constructs arrays defined by sets of recursive equations is described. It is assumed that the time of computation of an array element is a linear combination of its indices, and integer programming is used to seek a succession of hyperplanes along which array elements can be computed concurrently. The method can be used to schedule equations involving variable length dependency vectors and mutually recursive arrays. Portions of the work reported here have been implemented in the PS automatic program generation system

NASA Technical Reports Server

Shortest, Fastest, and Foremost Broadcast in Dynamic Networks

Author: Casteigts Arnaud
Flocchini Paola
Mans Bernard
Santoro Nicola
Publication venue
Publication date: 27/08/2014
Field of study

Highly dynamic networks rarely offer end-to-end connectivity at a given time. Yet, connectivity in these networks can be established over time and space, based on temporal analogues of multi-hop paths (also called {\em journeys}). Attempting to optimize the selection of the journeys in these networks naturally leads to the study of three cases: shortest (minimum hop), fastest (minimum duration), and foremost (earliest arrival) journeys. Efficient centralized algorithms exists to compute all cases, when the full knowledge of the network evolution is given. In this paper, we study the {\em distributed} counterparts of these problems, i.e. shortest, fastest, and foremost broadcast with termination detection (TDB), with minimal knowledge on the topology. We show that the feasibility of each of these problems requires distinct features on the evolution, through identifying three classes of dynamic graphs wherein the problems become gradually feasible: graphs in which the re-appearance of edges is {\em recurrent} (class R), {\em bounded-recurrent} (B), or {\em periodic} (P), together with specific knowledge that are respectively

n

(the number of nodes),

\Delta

(a bound on the recurrence time), and

p

(the period). In these classes it is not required that all pairs of nodes get in contact -- only that the overall {\em footprint} of the graph is connected over time. Our results, together with the strict inclusion between

P

B

, and

R

, implies a feasibility order among the three variants of the problem, i.e. TDB[foremost] requires weaker assumptions on the topology dynamics than TDB[shortest], which itself requires less than TDB[fastest]. Reversely, these differences in feasibility imply that the computational powers of

R_n

B_\Delta

, and

P_p

also form a strict hierarchy

arXiv.org e-Print Archive

CiteSeerX

Macquarie University ResearchOnline