66,443 research outputs found
Deflated GMRES for Systems with Multiple Shifts and Multiple Right-Hand Sides
We consider solution of multiply shifted systems of nonsymmetric linear
equations, possibly also with multiple right-hand sides. First, for a single
right-hand side, the matrix is shifted by several multiples of the identity.
Such problems arise in a number of applications, including lattice quantum
chromodynamics where the matrices are complex and non-Hermitian. Some Krylov
iterative methods such as GMRES and BiCGStab have been used to solve multiply
shifted systems for about the cost of solving just one system. Restarted GMRES
can be improved by deflating eigenvalues for matrices that have a few small
eigenvalues. We show that a particular deflated method, GMRES-DR, can be
applied to multiply shifted systems. In quantum chromodynamics, it is common to
have multiple right-hand sides with multiple shifts for each right-hand side.
We develop a method that efficiently solves the multiple right-hand sides by
using a deflated version of GMRES and yet keeps costs for all of the multiply
shifted systems close to those for one shift. An example is given showing this
can be extremely effective with a quantum chromodynamics matrix.Comment: 19 pages, 9 figure
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method
Pipelined Krylov subspace methods typically offer improved strong scaling on
parallel HPC hardware compared to standard Krylov subspace methods for large
and sparse linear systems. In pipelined methods the traditional synchronization
bottleneck is mitigated by overlapping time-consuming global communications
with useful computations. However, to achieve this communication hiding
strategy, pipelined methods introduce additional recurrence relations for a
number of auxiliary variables that are required to update the approximate
solution. This paper aims at studying the influence of local rounding errors
that are introduced by the additional recurrences in the pipelined Conjugate
Gradient method. Specifically, we analyze the impact of local round-off effects
on the attainable accuracy of the pipelined CG algorithm and compare to the
traditional CG method. Furthermore, we estimate the gap between the true
residual and the recursively computed residual used in the algorithm. Based on
this estimate we suggest an automated residual replacement strategy to reduce
the loss of attainable accuracy on the final iterative solution. The resulting
pipelined CG method with residual replacement improves the maximal attainable
accuracy of pipelined CG, while maintaining the efficient parallel performance
of the pipelined method. This conclusion is substantiated by numerical results
for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm
Parallel scheduling of recursively defined arrays
A new method of automatic generation of concurrent programs which constructs arrays defined by sets of recursive equations is described. It is assumed that the time of computation of an array element is a linear combination of its indices, and integer programming is used to seek a succession of hyperplanes along which array elements can be computed concurrently. The method can be used to schedule equations involving variable length dependency vectors and mutually recursive arrays. Portions of the work reported here have been implemented in the PS automatic program generation system
Shortest, Fastest, and Foremost Broadcast in Dynamic Networks
Highly dynamic networks rarely offer end-to-end connectivity at a given time.
Yet, connectivity in these networks can be established over time and space,
based on temporal analogues of multi-hop paths (also called {\em journeys}).
Attempting to optimize the selection of the journeys in these networks
naturally leads to the study of three cases: shortest (minimum hop), fastest
(minimum duration), and foremost (earliest arrival) journeys. Efficient
centralized algorithms exists to compute all cases, when the full knowledge of
the network evolution is given.
In this paper, we study the {\em distributed} counterparts of these problems,
i.e. shortest, fastest, and foremost broadcast with termination detection
(TDB), with minimal knowledge on the topology.
We show that the feasibility of each of these problems requires distinct
features on the evolution, through identifying three classes of dynamic graphs
wherein the problems become gradually feasible: graphs in which the
re-appearance of edges is {\em recurrent} (class R), {\em bounded-recurrent}
(B), or {\em periodic} (P), together with specific knowledge that are
respectively (the number of nodes), (a bound on the recurrence
time), and (the period). In these classes it is not required that all pairs
of nodes get in contact -- only that the overall {\em footprint} of the graph
is connected over time.
Our results, together with the strict inclusion between , , and ,
implies a feasibility order among the three variants of the problem, i.e.
TDB[foremost] requires weaker assumptions on the topology dynamics than
TDB[shortest], which itself requires less than TDB[fastest]. Reversely, these
differences in feasibility imply that the computational powers of ,
, and also form a strict hierarchy
- …