Search CORE

13 research outputs found

suCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Framework

Author: Chen Zizhong
Lin Lan
Song Fengguang
Zheng Weijian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

The scope of this paper is to design and implement a scalable QR factorization solver that can deliver the fastest performance for tall and skinny matrices and square matrices on modern supercomputers. The new solver, named scalable universal communication-avoiding QR factorization (suCAQR), introduces a simplified and tuning-less way to realize the communication-avoiding QR factorization algorithm to support matrices of any shapes. The software design includes a mixed usage of physical and logical data layouts, a simplified method of dynamic-root binary-tree reduction, and a dynamic dataflow implementation. Compared with the existing communication avoiding QR factorization implementations, suCAQR has the benefits of being simpler, more general, and more efficient. By balancing the degree of parallelism and the proportion of faster computational kernels, it is able to achieve scalable performance on clusters of multicore nodes. The software essentially combines the strengths of both synchronization-reducing approach and communication-avoiding approach to achieve high performance. Based on the experimental results using 1,024 CPU cores, suCAQR is faster than DPLASMA by up to 30%, and faster than ScaLAPACK by up to 30 times

Crossref

IUPUIScholarWorks

A 3D Parallel Algorithm for QR Decomposition

Author: Ballard Grey
Demmel James
Grigori Laura
Jacquelin Mathias
Knight Nicholas
Publication venue
Publication date: 14/05/2018
Field of study

Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

eScholarship - University of California

QR factorization over tunable processor grids

Author: Hutter Edward
Publication venue
Publication date: 01/05/2017
Field of study

The increasing complexity of modern computer architectures has greatly influenced algorithm design. Algorithm performance on these architectures is now determined by the movement of data. Therefore, modern algorithms should prioritize minimizing communication. In this work, we present a new parallel QR factorization algorithm solved over a tunable processor grid in a distributed memory environment. The processor grid can be tuned between one and three dimensions, resulting in tradeoffs in the asymptotic costs of synchronization, horizontal bandwidth, flop count, and memory footprint. This parallel algorithm is the first to efficiently extend the Cholesky-QR2 algorithm to matrices with an arbitrary number of rows and columns. Along its critical path of execution on P processors, our tunable algorithm improves upon the horizontal bandwidth cost of the existing Cholesky-QR2 algorithm by up to a factor of c^2 when solved over a c x d x c processor grid subject to P = c^2 d and E[1,P^1/3]. The costs attained by our algorithm are asymptotically equivalent to state-of-the-art QR factorization algorithms that have yet to be implemented. We argue that ours achieves better practicality and flexibility while still attaining minimal communication.Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

Reconstructing Householder Vectors from Tall-Skinny QR

Author: Ballard Grey
Demmel James W.
Grigori Laura
Jacquelin Mathias
Nguyen Hong Diep
Solomonik Edgar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienc

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot