207 research outputs found

    A Householder-based algorithm for Hessenberg-triangular reduction

    Full text link
    The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil A−λBA - \lambda B requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations regrouped and accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A non-vanishing fraction of the total flop count must nevertheless still be performed as sequences of overlapping Givens rotations alternately applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and poor scalability. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into block reflectors, by using (compact) WY representations. Even though the new algorithm requires more floating point operations than the state of the art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multi-threaded BLAS. The design and evaluation of a parallel formulation is future work

    Blocked algorithms for the reduction to Hessenberg-triangular form revisited

    Get PDF
    We present two variants of Moler and Stewart's algorithm for reducing a matrix pair to Hessenberg-triangular (HT) form with increased data locality in the access to the matrices. In one of these variants, a careful reorganization and accumulation of Givens rotations enables the use of efficient level 3 BLAS. Experimental results on four different architectures, representative of current high performance processors, compare the performances of the new variants with those of the implementation of Moler and Stewart's algorithm in subroutine DGGHRD from LAPACK, Dackland and KÄgström's two-stage algorithm for the HT form, and a modified version of the latter which requires considerably less flop

    A multishift, multipole rational QZ method with aggressive early deflation

    Full text link
    The rational QZ method generalizes the QZ method by implicitly supporting rational subspace iteration. In this paper we extend the rational QZ method by introducing shifts and poles of higher multiplicity in the Hessenberg pencil, which is a pencil consisting of two Hessenberg matrices. The result is a multishift, multipole iteration on block Hessenberg pencils which allows one to stick to real arithmetic for a real input pencil. In combination with optimally packed shifts and aggressive early deflation as an advanced deflation technique we obtain an efficient method for the dense generalized eigenvalue problem. In the numerical experiments we compare the results with state-of-the-art routines for the generalized eigenvalue problem and show that we are competitive in terms of speed and accuracy

    Algorithm-Based Fault Tolerance for Two-Sided Dense Matrix Factorizations

    Get PDF
    The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale computers are expected to have a MTBF of around 30 minutes. Therefore, it is urgent to prepare important algorithms for future machines with such a short MTBF. Eigenvalue problems (EVP) and singular value problems (SVP) are common in engineering and scientific research. Solving EVP and SVP numerically involves two-sided matrix factorizations: the Hessenberg reduction, the tridiagonal reduction, and the bidiagonal reduction. These three factorizations are computation intensive, and have long running times. They are prone to suffer from computer failures. We designed algorithm-based fault tolerant (ABFT) algorithms for the parallel Hessenberg reduction and the parallel tridiagonal reduction. The ABFT algorithms target fail-stop errors. These two fault tolerant algorithms use a combination of ABFT and diskless checkpointing. ABFT is used to protect frequently modified data . We carefully design the ABFT algorithm so the checksums are valid at the end of each iterative cycle. Diskless checkpointing is used for rarely modified data. These checkpoints are in the form of checksums, which are small in size, so the time and storage cost to store them in main memory is small. Also, there are intermediate results which need to be protected for a short time window. We store a copy of this data on the neighboring process in the process grid. We also designed algorithm-based fault tolerant algorithms for the CPU-GPU hybrid Hessenberg reduction algorithm and the CPU-GPU hybrid bidiagonal reduction algorithm. These two fault tolerant algorithms target silent errors. Our design employs both ABFT and diskless checkpointing to provide data redundancy. The low cost error detection uses two dot products and an equality test. The recovery protocol uses reverse computation to roll back the state of the matrix to a point where it is easy to locate and correct errors. We provided theoretical analysis and experimental verification on the correctness and efficiency of our fault tolerant algorithm design. We also provided mathematical proof on the numerical stability of the factorization results after fault recovery. Experimental results corroborate with the mathematical proof that the impact is mild

    A parallel Schur method for solving continuous-time algebraic Riccati equations

    Get PDF
    Numerical algorithms for solving the continuous-time algebraic Riccati matrix equation on a distributed memory parallel computer are considered. In particular, it is shown that the Schur method, based on computing the stable invariant subspace of a Hamiltonian matrix, can be parallelized in an efficient and scalable way. Our implementation employs the state-of-the-art library ScaLAPACK as well as recently developed parallel methods for reordering the eigenvalues in a real Schur form. Some experimental results are presented, confirming the scalability of our implementation and comparing it with an existing implementation of the matrix sign iteration from the PLiCOC library

    Effizientes Lösen von großskaligen Riccati-Gleichungen und ein ODE-Framework fĂŒr lineare Matrixgleichungen

    Get PDF
    This work considers the iterative solution of large-scale matrix equations. Due to the size of the system matrices in large-scale Riccati equations the solution can not be calculated directly but is approximated by a low rank matrix ZYZ^*. Herein Z is a basis of a low-dimensional rational Krylov subspace. The inner matrix Y is a small square matrix. Two ways to choose this inner matrix are examined: By imposing a rank condition on the Riccati residual and by projecting the Riccati residual onto the Krylov subspace generated by Z. The rank condition is motivated by the well-known ADI iteration. The ADI solutions span a rational Krylov subspace and yield a rank-p residual. It is proven that the rank-p condition guarantees existence and uniqueness of such an approximate solution. Known projection methods are generalized to oblique projections and a new formulation of the Riccati residual is derived, which allows for an efficient evaluation of the residual norm. Further a truncated approximate solution is characterized as the solution of a Riccati equation, which is projected to a subspace of the Krylov subspace generated by Z. For the approximate solution of Lyapunov equations a system of ordinary differential equations (ODEs) is solved via Runge-Kutta methods. It is shown that the space spanned by the approximate solution is a rational Krylov subspace with poles determined by the time step sizes and the eigenvalues of the matrices of the Butcher tableau of the used Runge-Kutta method. The method is applied to a model order reduction problem. The analytical solution of the system of ODEs satisfies an algebraic invariant. Those Runge-Kutta methods which preserve this algebraic invariant are characterized by a simple condition on the corresponding Butcher tableau. It is proven that these methods are equivalent to the ADI iteration. The invariance approach is transferred to Sylvester equations.Diese Arbeit befasst sich mit der numerischen Lösung hochdimensionaler Matrixgleichungen mittels iterativer Verfahren. Aufgrund der GrĂ¶ĂŸe der Systemmatrizen in großskaligen algebraischen Riccati-Gleichung kann die Lösung nicht direkt bestimmt werden, sondern wird durch eine approximative Lösung ZYZ^* von geringem Rang angenĂ€hert. Hierbei wird Z als Basis eines rationalen Krylovraums gewĂ€hlt und enthĂ€lt nur wenige Spalten. Die innere Matrix Y ist klein und quadratisch. Es werden zwei Wege untersucht, die Matrix Y zu wĂ€hlen: Durch eine Rang-Bedingung an das Riccati-Residuum und durch Projektion des Riccati-Residuums auf den von Z erzeugten Krylovraum. Die Rang-Bedingung wird durch die wohlbekannten ADI-Verfahren motiviert. Die approximativen ADI-Lösungen spannen einen Krylovraum auf und fĂŒhren zu einem Riccati-Residuum vom Rang p. Es wird bewiesen, dass die Rang-p-Bedingung Existenz und Eindeutigkeit einer solchen approximativen Lösung impliziert. Aus diesem Ergebnis werden effiziente iterative Verfahren abgeleitet, die eine solche approximative Lösung erzeugen. Bisher bekannte Projektionsverfahren werden auf schiefe Projektionen erweitert und es wird eine neue Formulierung des Riccati-Residuums hergeleitet, die eine effiziente Berechnung der Norm erlaubt. Weiter wird eine abgeschnittene approximative Lösung als Lösung einer Riccati-Gleichung charakterisiert, die auf einen Unterraum des von Z erzeugten Krylovraums projiziert wird. Um die Lösung der Lyapunov-Gleichung zu approximieren wird ein System gewöhnlicher Differentialgleichungen mittels Runge-Kutta-Verfahren numerisch gelöst. Es wird gezeigt, dass der von der approximativen Lösung aufgespannte Raum ein rationaler Krylovraum ist, dessen Pole von den Zeitschrittweiten der Integration und den Eigenwerten der Koeffizientenmatrix aus dem Butcher-Tableau des verwendeten Runge-Kutta-Verfahrens abhĂ€ngen. Das Verfahren wird auf ein Problem der Modellreduktion angewendet. Die analytische Lösung des Differentialgleichungssystems erfĂŒllt eine algebraische Invariante. Diejenigen Runge-Kutta-Verfahren, die diese Invariante erhalten, werden durch eine Bedingung an die zugehörigen Butcher-Tableaus charakterisiert. Es wird gezeigt, dass diese speziellen Verfahren Ă€quivalent zur ADI-Iteration sind. Der Invarianten-Ansatz wird auf Sylvester-Gleichungen ĂŒbertragen

    An Algorithm for Simultaneous Band Reduction of Two Dense Symmetric Matrices (Fusion of theory and practice in applied mathematics and computational science)

    Get PDF
    In this paper, we propose an algorithm for simultaneously reducing two dense symmetric matrices to band form with the same bandwidth by congruent transformations. The simultaneous band reduction can be considered as an extension of the simultaneous tridiagonalization of two dense symmetric matrices. In contrast to algorithms of simultaneous tridiagonalization that are based on Leve1-2 BLAS (Basic Linear Algebra Subroutine) operations, our band reduction algorithm is devised to take full advantage of Leve1-3 BLAS operations for better performance. Numerical results are presented to illustrate the effectiveness of our algorithm

    High-Performance Solvers for Dense Hermitian Eigenproblems

    Full text link
    We introduce a new collection of solvers - subsequently called EleMRRR - for large-scale dense Hermitian eigenproblems. EleMRRR solves various types of problems: generalized, standard, and tridiagonal eigenproblems. Among these, the last is of particular importance as it is a solver on its own right, as well as the computational kernel for the first two; we present a fast and scalable tridiagonal solver based on the Algorithm of Multiple Relatively Robust Representations - referred to as PMRRR. Like the other EleMRRR solvers, PMRRR is part of the freely available Elemental library, and is designed to fully support both message-passing (MPI) and multithreading parallelism (SMP). As a result, the solvers can equally be used in pure MPI or in hybrid MPI-SMP fashion. We conducted a thorough performance study of EleMRRR and ScaLAPACK's solvers on two supercomputers. Such a study, performed with up to 8,192 cores, provides precise guidelines to assemble the fastest solver within the ScaLAPACK framework; it also indicates that EleMRRR outperforms even the fastest solvers built from ScaLAPACK's components
    • 

    corecore