37,805 research outputs found
The solution of linear systems of equations with a structural analysis code on the NAS CRAY-2
Two methods for solving linear systems of equations on the NAS Cray-2 are described. One is a direct method; the other is an iterative method. Both methods exploit the architecture of the Cray-2, particularly the vectorization, and are aimed at structural analysis applications. To demonstrate and evaluate the methods, they were installed in a finite element structural analysis code denoted the Computational Structural Mechanics (CSM) Testbed. A description of the techniques used to integrate the two solvers into the Testbed is given. Storage schemes, memory requirements, operation counts, and reformatting procedures are discussed. Finally, results from the new methods are compared with results from the initial Testbed sparse Choleski equation solver for three structural analysis problems. The new direct solvers described achieve the highest computational rates of the methods compared. The new iterative methods are not able to achieve as high computation rates as the vectorized direct solvers but are best for well conditioned problems which require fewer iterations to converge to the solution
Numerically Stable Recurrence Relations for the Communication Hiding Pipelined Conjugate Gradient Method
Pipelined Krylov subspace methods (also referred to as communication-hiding
methods) have been proposed in the literature as a scalable alternative to
classic Krylov subspace algorithms for iteratively computing the solution to a
large linear system in parallel. For symmetric and positive definite system
matrices the pipelined Conjugate Gradient method outperforms its classic
Conjugate Gradient counterpart on large scale distributed memory hardware by
overlapping global communication with essential computations like the
matrix-vector product, thus hiding global communication. A well-known drawback
of the pipelining technique is the (possibly significant) loss of numerical
stability. In this work a numerically stable variant of the pipelined Conjugate
Gradient algorithm is presented that avoids the propagation of local rounding
errors in the finite precision recurrence relations that construct the Krylov
subspace basis. The multi-term recurrence relation for the basis vector is
replaced by two-term recurrences, improving stability without increasing the
overall computational cost of the algorithm. The proposed modification ensures
that the pipelined Conjugate Gradient method is able to attain a highly
accurate solution independently of the pipeline length. Numerical experiments
demonstrate a combination of excellent parallel performance and improved
maximal attainable accuracy for the new pipelined Conjugate Gradient algorithm.
This work thus resolves one of the major practical restrictions for the
useability of pipelined Krylov subspace methods.Comment: 15 pages, 5 figures, 1 table, 2 algorithm
A randomized Kaczmarz algorithm with exponential convergence
The Kaczmarz method for solving linear systems of equations is an iterative
algorithm that has found many applications ranging from computer tomography to
digital signal processing. Despite the popularity of this method, useful
theoretical estimates for its rate of convergence are still scarce. We
introduce a randomized version of the Kaczmarz method for consistent,
overdetermined linear systems and we prove that it converges with expected
exponential rate. Furthermore, this is the first solver whose rate does not
depend on the number of equations in the system. The solver does not even need
to know the whole system, but only a small random part of it. It thus
outperforms all previously known methods on general extremely overdetermined
systems. Even for moderately overdetermined systems, numerical simulations as
well as theoretical analysis reveal that our algorithm can converge faster than
the celebrated conjugate gradient algorithm. Furthermore, our theory and
numerical simulations confirm a prediction of Feichtinger et al. in the context
of reconstructing bandlimited functions from nonuniform sampling
- …