9,369 research outputs found
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined Conjugate Gradient method
Pipelined Krylov subspace methods typically offer improved strong scaling on
parallel HPC hardware compared to standard Krylov subspace methods for large
and sparse linear systems. In pipelined methods the traditional synchronization
bottleneck is mitigated by overlapping time-consuming global communications
with useful computations. However, to achieve this communication hiding
strategy, pipelined methods introduce additional recurrence relations for a
number of auxiliary variables that are required to update the approximate
solution. This paper aims at studying the influence of local rounding errors
that are introduced by the additional recurrences in the pipelined Conjugate
Gradient method. Specifically, we analyze the impact of local round-off effects
on the attainable accuracy of the pipelined CG algorithm and compare to the
traditional CG method. Furthermore, we estimate the gap between the true
residual and the recursively computed residual used in the algorithm. Based on
this estimate we suggest an automated residual replacement strategy to reduce
the loss of attainable accuracy on the final iterative solution. The resulting
pipelined CG method with residual replacement improves the maximal attainable
accuracy of pipelined CG, while maintaining the efficient parallel performance
of the pipelined method. This conclusion is substantiated by numerical results
for a variety of benchmark problems.Comment: 26 pages, 6 figures, 2 tables, 4 algorithm
A Lanczos Method for Approximating Composite Functions
We seek to approximate a composite function h(x) = g(f(x)) with a global
polynomial. The standard approach chooses points x in the domain of f and
computes h(x) at each point, which requires an evaluation of f and an
evaluation of g. We present a Lanczos-based procedure that implicitly
approximates g with a polynomial of f. By constructing a quadrature rule for
the density function of f, we can approximate h(x) using many fewer evaluations
of g. The savings is particularly dramatic when g is much more expensive than f
or the dimension of x is large. We demonstrate this procedure with two
numerical examples: (i) an exponential function composed with a rational
function and (ii) a Navier-Stokes model of fluid flow with a scalar input
parameter that depends on multiple physical quantities
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
Raman scattering from fractals. Simulation on large structures by the method of moments
We have employed the method of spectral moments to study the density of
vibrational states and the Raman coupling coefficient of large 2- and 3-
dimensional percolators at threshold and at higher concentration. We first
discuss the over-and under-flow problems of the procedure which arise when
-like in the present case- it is necessary to calculate a few thousand moments.
Then we report on the numerical results; these show that different scattering
mechanisms, all {\it a priori} equally probable in real systems, produce
largely different coupling coefficients with different frequency dependence.
Our results are compared with existing scaling theories of Raman scattering.
The situation that emerges is complex; on the one hand, there is indication
that the existing theory is not satisfactory; on the other hand, the
simulations above threshold show that in this case the coupling coefficients
have very little resemblance, if any, with the same quantities at threshold.Comment: 26 pages, RevTex, 8 figures available on reques
Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices
We propose a Preconditioned Locally Harmonic Residual (PLHR) method for
computing several interior eigenpairs of a generalized Hermitian eigenvalue
problem, without traditional spectral transformations, matrix factorizations,
or inversions. PLHR is based on a short-term recurrence, easily extended to a
block form, computing eigenpairs simultaneously. PLHR can take advantage of
Hermitian positive definite preconditioning, e.g., based on an approximate
inverse of an absolute value of a shifted matrix, introduced in [SISC, 35
(2013), pp. A696-A718]. Our numerical experiments demonstrate that PLHR is
efficient and robust for certain classes of large-scale interior eigenvalue
problems, involving Laplacian and Hamiltonian operators, especially if memory
requirements are tight
- …