Search CORE

2 research outputs found

A Householder-based algorithm for Hessenberg-triangular reduction

Author: Bujanović Zvonimir
Karlsson Lars
Kressner Daniel
Publication venue
Publication date: 29/05/2018
Field of study

The QZ algorithm for computing eigenvalues and eigenvectors of a matrix pencil

A - \lambda B

requires that the matrices first be reduced to Hessenberg-triangular (HT) form. The current method of choice for HT reduction relies entirely on Givens rotations regrouped and accumulated into small dense matrices which are subsequently applied using matrix multiplication routines. A non-vanishing fraction of the total flop count must nevertheless still be performed as sequences of overlapping Givens rotations alternately applied from the left and from the right. The many data dependencies associated with this computational pattern leads to inefficient use of the processor and poor scalability. In this paper, we therefore introduce a fundamentally different approach that relies entirely on (large) Householder reflectors partially accumulated into block reflectors, by using (compact) WY representations. Even though the new algorithm requires more floating point operations than the state of the art algorithm, extensive experiments on both real and synthetic data indicate that it is still competitive, even in a sequential setting. The new algorithm is conjectured to have better parallel scalability, an idea which is partially supported by early small-scale experiments using multi-threaded BLAS. The design and evaluation of a parallel formulation is future work

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations ⋆

Author: B. Kågström
B. Lang
B.N. Parlett
C. Vomel
C.H. Bischof
D.C. Sorensen
D.J. Singh
E. Anderson
E. Anderson
F. Tisseur
G.H. Golub
J.J. Dongarra
J.J.M. Cuppen
J.O. Aasen
L.C.F. Ipsen
L.S. Blackford
P. Bientinesi
P. Kent
R.G. Grimes
S. Tomov
T. Auckenthaler
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. Today’s high computational demands from engineering fields and complex hardware development make it necessary to develop and optimize new algorithms toward achieving high performance and good scalability on the next generation of computers. The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe and analyze a successful methodology to address the challenges—starting from our algorithm design, kernel optimization and tuning, to our programming model—in the development of a scalable high-performance generalized eigenvalue solver in the context of electronic structure calculations in materials science applications. We developed a set of leading edge dense linear algebra algorithms, as part of a generalized eigensolver, featuring fine grained memory aware kernels, a task based approach and hybrid execution/scheduling. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. We report the performance impact on the generalized eigensolver when different fractions of eigenvectors are needed. The algorithm described provides an enormous performance boost compared to current GPU-based solutions, and performance comparable to state-of-the-art distributed solutions, using a single node with multiple GPUs.

CiteSeerX

Crossref