534 research outputs found
Minimizing Communication for Eigenproblems and the Singular Value Decomposition
Algorithms have two costs: arithmetic and communication. The latter
represents the cost of moving data, either between levels of a memory
hierarchy, or between processors over a network. Communication often dominates
arithmetic and represents a rapidly increasing proportion of the total cost, so
we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds
were presented on the amount of communication required for essentially all
-like algorithms for linear algebra, including eigenvalue problems and
the SVD. Conventional algorithms, including those currently implemented in
(Sca)LAPACK, perform asymptotically more communication than these lower bounds
require. In this paper we present parallel and sequential eigenvalue algorithms
(for pencils, nonsymmetric matrices, and symmetric matrices) and SVD algorithms
that do attain these lower bounds, and analyze their convergence and
communication costs.Comment: 43 pages, 11 figure
Fast computation of spectral projectors of banded matrices
We consider the approximate computation of spectral projectors for symmetric
banded matrices. While this problem has received considerable attention,
especially in the context of linear scaling electronic structure methods, the
presence of small relative spectral gaps challenges existing methods based on
approximate sparsity. In this work, we show how a data-sparse approximation
based on hierarchical matrices can be used to overcome this problem. We prove a
priori bounds on the approximation error and propose a fast algo- rithm based
on the QDWH algorithm, along the works by Nakatsukasa et al. Numerical
experiments demonstrate that the performance of our algorithm is robust with
respect to the spectral gap. A preliminary Matlab implementation becomes faster
than eig already for matrix sizes of a few thousand.Comment: 27 pages, 10 figure
Lanczos eigensolution method for high-performance computers
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
Fast Hessenberg reduction of some rank structured matrices
We develop two fast algorithms for Hessenberg reduction of a structured
matrix where is a real or unitary diagonal
matrix and . The proposed algorithm for the
real case exploits a two--stage approach by first reducing the matrix to a
generalized Hessenberg form and then completing the reduction by annihilation
of the unwanted sub-diagonals. It is shown that the novel method requires
arithmetic operations and it is significantly faster than other
reduction algorithms for rank structured matrices. The method is then extended
to the unitary plus low rank case by using a block analogue of the CMV form of
unitary matrices. It is shown that a block Lanczos-type procedure for the block
tridiagonalization of induces a structured reduction on in a block
staircase CMV--type shape. Then, we present a numerically stable method for
performing this reduction using unitary transformations and we show how to
generalize the sub-diagonal elimination to this shape, while still being able
to provide a condensed representation for the reduced matrix. In this way the
complexity still remains linear in and, moreover, the resulting algorithm
can be adapted to deal efficiently with block companion matrices.Comment: 25 page
Solving Dense Generalized Eigenproblems on Multi-threaded Architectures
We compare two approaches to compute a fraction of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism of the method to the performance of the solvers. The experimental results on a state-of-the-art 8-core platform, equipped with a graphics processing unit (GPU), reveal that in realistic applications, iterative Krylov-subspace methods can be a competitive approach also for the solution of dense problems
- …