9,816 research outputs found
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method
Pipelined Krylov subspace methods typically offer improved strong scaling on
parallel HPC hardware compared to standard Krylov subspace methods for large
and sparse linear systems. In pipelined methods the traditional synchronization
bottleneck is mitigated by overlapping time-consuming global communications
with useful computations. However, to achieve this communication hiding
strategy, pipelined methods introduce additional recurrence relations for a
number of auxiliary variables that are required to update the approximate
solution. This paper aims at studying the influence of local rounding errors
that are introduced by the additional recurrences in the pipelined Conjugate
Gradient method. Specifically, we analyze the impact of local round-off effects
on the attainable accuracy of the pipelined CG algorithm and compare to the
traditional CG method. Furthermore, we estimate the gap between the true
residual and the recursively computed residual used in the algorithm. Based on
this estimate we suggest an automated residual replacement strategy to reduce
the loss of attainable accuracy on the final iterative solution. The resulting
pipelined CG method with residual replacement improves the maximal attainable
accuracy of pipelined CG, while maintaining the efficient parallel performance
of the pipelined method. This conclusion is substantiated by numerical results
for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
- …