66,443 research outputs found

    Deflated GMRES for Systems with Multiple Shifts and Multiple Right-Hand Sides

    Get PDF
    We consider solution of multiply shifted systems of nonsymmetric linear equations, possibly also with multiple right-hand sides. First, for a single right-hand side, the matrix is shifted by several multiples of the identity. Such problems arise in a number of applications, including lattice quantum chromodynamics where the matrices are complex and non-Hermitian. Some Krylov iterative methods such as GMRES and BiCGStab have been used to solve multiply shifted systems for about the cost of solving just one system. Restarted GMRES can be improved by deflating eigenvalues for matrices that have a few small eigenvalues. We show that a particular deflated method, GMRES-DR, can be applied to multiply shifted systems. In quantum chromodynamics, it is common to have multiple right-hand sides with multiple shifts for each right-hand side. We develop a method that efficiently solves the multiple right-hand sides by using a deflated version of GMRES and yet keeps costs for all of the multiply shifted systems close to those for one shift. An example is given showing this can be extremely effective with a quantum chromodynamics matrix.Comment: 19 pages, 9 figure

    Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method

    Full text link
    Pipelined Krylov subspace methods (also referred to as communication-hiding methods) have been proposed in the literature as a scalable alternative to classic Krylov subspace algorithms for iteratively computing the solution to a large linear system in parallel. For symmetric and positive definite system matrices the pipelined Conjugate Gradient method outperforms its classic Conjugate Gradient counterpart on large scale distributed memory hardware by overlapping global communication with essential computations like the matrix-vector product, thus hiding global communication. A well-known drawback of the pipelining technique is the (possibly significant) loss of numerical stability. In this work a numerically stable variant of the pipelined Conjugate Gradient algorithm is presented that avoids the propagation of local rounding errors in the finite precision recurrence relations that construct the Krylov subspace basis. The multi-term recurrence relation for the basis vector is replaced by two-term recurrences, improving stability without increasing the overall computational cost of the algorithm. The proposed modification ensures that the pipelined Conjugate Gradient method is able to attain a highly accurate solution independently of the pipeline length. Numerical experiments demonstrate a combination of excellent parallel performance and improved maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm. This work thus resolves one of the major practical restrictions for the useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm

    Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method

    Get PDF
    Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm

    Parallel scheduling of recursively defined arrays

    Get PDF
    A new method of automatic generation of concurrent programs which constructs arrays defined by sets of recursive equations is described. It is assumed that the time of computation of an array element is a linear combination of its indices, and integer programming is used to seek a succession of hyperplanes along which array elements can be computed concurrently. The method can be used to schedule equations involving variable length dependency vectors and mutually recursive arrays. Portions of the work reported here have been implemented in the PS automatic program generation system

    Shortest, Fastest, and Foremost Broadcast in Dynamic Networks

    Full text link
    Highly dynamic networks rarely offer end-to-end connectivity at a given time. Yet, connectivity in these networks can be established over time and space, based on temporal analogues of multi-hop paths (also called {\em journeys}). Attempting to optimize the selection of the journeys in these networks naturally leads to the study of three cases: shortest (minimum hop), fastest (minimum duration), and foremost (earliest arrival) journeys. Efficient centralized algorithms exists to compute all cases, when the full knowledge of the network evolution is given. In this paper, we study the {\em distributed} counterparts of these problems, i.e. shortest, fastest, and foremost broadcast with termination detection (TDB), with minimal knowledge on the topology. We show that the feasibility of each of these problems requires distinct features on the evolution, through identifying three classes of dynamic graphs wherein the problems become gradually feasible: graphs in which the re-appearance of edges is {\em recurrent} (class R), {\em bounded-recurrent} (B), or {\em periodic} (P), together with specific knowledge that are respectively nn (the number of nodes), Δ\Delta (a bound on the recurrence time), and pp (the period). In these classes it is not required that all pairs of nodes get in contact -- only that the overall {\em footprint} of the graph is connected over time. Our results, together with the strict inclusion between PP, BB, and RR, implies a feasibility order among the three variants of the problem, i.e. TDB[foremost] requires weaker assumptions on the topology dynamics than TDB[shortest], which itself requires less than TDB[fastest]. Reversely, these differences in feasibility imply that the computational powers of RnR_n, BΔB_\Delta, and PpP_p also form a strict hierarchy
    • …
    corecore