26,652 research outputs found

    Differential qd algorithm with shifts for rank-structured matrices

    Full text link
    Although QR iterations dominate in eigenvalue computations, there are several important cases when alternative LR-type algorithms may be preferable. In particular, in the symmetric tridiagonal case where differential qd algorithm with shifts (dqds) proposed by Fernando and Parlett enjoys often faster convergence while preserving high relative accuracy (that is not guaranteed in QR algorithm). In eigenvalue computations for rank-structured matrices QR algorithm is also a popular choice since, in the symmetric case, the rank structure is preserved. In the unsymmetric case, however, QR algorithm destroys the rank structure and, hence, LR-type algorithms come to play once again. In the current paper we discover several variants of qd algorithms for quasiseparable matrices. Remarkably, one of them, when applied to Hessenberg matrices becomes a direct generalization of dqds algorithm for tridiagonal matrices. Therefore, it can be applied to such important matrices as companion and confederate, and provides an alternative algorithm for finding roots of a polynomial represented in the basis of orthogonal polynomials. Results of preliminary numerical experiments are presented

    Analysis of a Classical Matrix Preconditioning Algorithm

    Get PDF
    We study a classical iterative algorithm for balancing matrices in the LL_\infty norm via a scaling transformation. This algorithm, which goes back to Osborne and Parlett \& Reinsch in the 1960s, is implemented as a standard preconditioner in many numerical linear algebra packages. Surprisingly, despite its widespread use over several decades, no bounds were known on its rate of convergence. In this paper we prove that, for any irreducible n×nn\times n (real or complex) input matrix~AA, a natural variant of the algorithm converges in O(n3log(nρ/ε))O(n^3\log(n\rho/\varepsilon)) elementary balancing operations, where ρ\rho measures the initial imbalance of~AA and ε\varepsilon is the target imbalance of the output matrix. (The imbalance of~AA is maxilog(aiout/aiin)\max_i |\log(a_i^{\text{out}}/a_i^{\text{in}})|, where aiout,aiina_i^{\text{out}},a_i^{\text{in}} are the maximum entries in magnitude in the iith row and column respectively.) This bound is tight up to the logn\log n factor. A balancing operation scales the iith row and column so that their maximum entries are equal, and requires O(m/n)O(m/n) arithmetic operations on average, where mm is the number of non-zero elements in~AA. Thus the running time of the iterative algorithm is O~(n2m)\tilde{O}(n^2m). This is the first time bound of any kind on any variant of the Osborne-Parlett-Reinsch algorithm. We also prove a conjecture of Chen that characterizes those matrices for which the limit of the balancing process is independent of the order in which balancing operations are performed.Comment: The previous version (1) (see also STOC'15) handled UB ("unique balance") input matrices. In this version (2) we extend the work to handle all input matrice

    Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

    Get PDF
    The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition

    Achieving Efficient Strong Scaling with PETSc using Hybrid MPI/OpenMP Optimisation

    Full text link
    The increasing number of processing elements and decreas- ing memory to core ratio in modern high-performance platforms makes efficient strong scaling a key requirement for numerical algorithms. In order to achieve efficient scalability on massively parallel systems scientific software must evolve across the entire stack to exploit the multiple levels of parallelism exposed in modern architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP parallelisation to optimise parallel sparse matrix-vector multiplication in PETSc, a widely used scientific library for the scalable solution of partial differential equations. Using large matrices generated by Fluidity, an open source CFD application code which uses PETSc as its linear solver engine, we evaluate the effect of explicit communication overlap using task-based parallelism and show how to further improve performance by explicitly load balancing threads within MPI processes. We demonstrate a significant speedup over the pure-MPI mode and efficient strong scaling of sparse matrix-vector multiplication on Fujitsu PRIMEHPC FX10 and Cray XE6 systems
    corecore