26,652 research outputs found
Differential qd algorithm with shifts for rank-structured matrices
Although QR iterations dominate in eigenvalue computations, there are several
important cases when alternative LR-type algorithms may be preferable. In
particular, in the symmetric tridiagonal case where differential qd algorithm
with shifts (dqds) proposed by Fernando and Parlett enjoys often faster
convergence while preserving high relative accuracy (that is not guaranteed in
QR algorithm). In eigenvalue computations for rank-structured matrices QR
algorithm is also a popular choice since, in the symmetric case, the rank
structure is preserved. In the unsymmetric case, however, QR algorithm destroys
the rank structure and, hence, LR-type algorithms come to play once again. In
the current paper we discover several variants of qd algorithms for
quasiseparable matrices. Remarkably, one of them, when applied to Hessenberg
matrices becomes a direct generalization of dqds algorithm for tridiagonal
matrices. Therefore, it can be applied to such important matrices as companion
and confederate, and provides an alternative algorithm for finding roots of a
polynomial represented in the basis of orthogonal polynomials. Results of
preliminary numerical experiments are presented
Analysis of a Classical Matrix Preconditioning Algorithm
We study a classical iterative algorithm for balancing matrices in the
norm via a scaling transformation. This algorithm, which goes back
to Osborne and Parlett \& Reinsch in the 1960s, is implemented as a standard
preconditioner in many numerical linear algebra packages. Surprisingly, despite
its widespread use over several decades, no bounds were known on its rate of
convergence. In this paper we prove that, for any irreducible (real
or complex) input matrix~, a natural variant of the algorithm converges in
elementary balancing operations, where
measures the initial imbalance of~ and is the target imbalance
of the output matrix. (The imbalance of~ is , where
are the maximum entries in magnitude in the
th row and column respectively.) This bound is tight up to the
factor. A balancing operation scales the th row and column so that their
maximum entries are equal, and requires arithmetic operations on
average, where is the number of non-zero elements in~. Thus the running
time of the iterative algorithm is . This is the first time
bound of any kind on any variant of the Osborne-Parlett-Reinsch algorithm. We
also prove a conjecture of Chen that characterizes those matrices for which the
limit of the balancing process is independent of the order in which balancing
operations are performed.Comment: The previous version (1) (see also STOC'15) handled UB ("unique
balance") input matrices. In this version (2) we extend the work to handle
all input matrice
Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach
The real symmetric tridiagonal eigenproblem is of outstanding importance in
numerical computations; it arises frequently as part of eigensolvers for
standard and generalized dense Hermitian eigenproblems that are based on a
reduction to tridiagonal form. For its solution, the algorithm of Multiple
Relatively Robust Representations (MRRR) is among the fastest methods. Although
fast, the solvers based on MRRR do not deliver the same accuracy as competing
methods like Divide & Conquer or the QR algorithm. In this paper, we
demonstrate that the use of mixed precisions leads to improved accuracy of
MRRR-based eigensolvers with limited or no performance penalty. As a result, we
obtain eigensolvers that are not only equally or more accurate than the best
available methods, but also -in most circumstances- faster and more scalable
than the competition
Achieving Efficient Strong Scaling with PETSc using Hybrid MPI/OpenMP Optimisation
The increasing number of processing elements and decreas- ing memory to core
ratio in modern high-performance platforms makes efficient strong scaling a key
requirement for numerical algorithms. In order to achieve efficient scalability
on massively parallel systems scientific software must evolve across the entire
stack to exploit the multiple levels of parallelism exposed in modern
architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP
parallelisation to optimise parallel sparse matrix-vector multiplication in
PETSc, a widely used scientific library for the scalable solution of partial
differential equations. Using large matrices generated by Fluidity, an open
source CFD application code which uses PETSc as its linear solver engine, we
evaluate the effect of explicit communication overlap using task-based
parallelism and show how to further improve performance by explicitly load
balancing threads within MPI processes. We demonstrate a significant speedup
over the pure-MPI mode and efficient strong scaling of sparse matrix-vector
multiplication on Fujitsu PRIMEHPC FX10 and Cray XE6 systems
- …