Search CORE

5 research outputs found

On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties

Author: Donfack Simplice
Dongarra Jack
Faverge Mathieu
Gates Mark
Kurzak Jakub
Luszczek Piotr
Yamazaki Ichitaro
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

Gaussian elimination is a canonical linear algebra procedure for solving linear systems of equations. In the last few years, the algorithm received a lot of attention in an attempt to improve its parallel performance. This article surveys recent developments in parallel implementations of the Gaussian elimination. Five different flavors are investigated. Three of them are based on different strategies for pivoting: partial pivoting, incremental pivoting, and tournament pivoting. The fourth one replaces pivoting with the Random Butterfly Transformation, and finally, an implementation without pivoting is used as a performance baseline. The technique of iterative refinement is applied to recover numerical accuracy when necessary. All parallel implementations are produced using dynamic, superscalar, runtime scheduling and tile matrix layout. Results on two multi-socket multicore systems are presented. Performance and numerical accuracy is analyzed

INRIA a CCSD electronic archive server

Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence

Author: Alan Edelman
Publication venue
Publication date: 01/01/1993
Field of study

This article surveys the current state of applications of large dense numerical linear algebra and the influence of parallel computing. Furthermore, it attempts to crystallize many important ideas that are sometimes misunderstood in the rush to write fast programs. 1 Introduction This article represents my continuing efforts to track the status of large dense linear algebra problems. The goal is to shatter the barriers that separate the various interested communities while commenting on the influence of parallel computing. A secondary goal is to crystallize the most important ideas that have all too often been obscured by the details of machines and algorithms. Parallel supercomputing is in the spotlight. In the race toward the proliferation of papers on person X's experiences with machine Y (and why X's algorithm runs faster than person Z's), sometimes we have lost sight of the applications for which these algorithms are meant to be useful. This article concentrates on large dense nu..

CiteSeerX

eScholarship - University of California

Large Dense Numerical Linear Algebra in 1993: the Parallel Computing Influence

Author: Anderson D.A.
Atkinson K.
Bendali A.
Bettadpur S.
Brebbia C. A.
Canning F.X.
Crouzeix M.
Demmel J.W.
Demmel J.W.
Dongarra J.
Edelman A.
Edelman A.
Edelman A.
Freund R.W.
Gallivan K.
Hess J.L.
Johnson C.
Mehta M.L.
Moler C.
Nicholson D.M.
Parlett B.N.
Prentice J.K.
Robert Y.
Wang J.J.H.
Wilkinson J.H.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

AIR: Adaptive Dynamic Precision Iterative Refinement

Author: Lee Jun Kyu
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/01/2012
Field of study

In high performance computing, applications often require very accurate solutions while minimizing runtimes and power consumption. Improving the ratio of the number of logic gates implementing floating point arithmetic operations to the total number of logic gates enables greater efficiency, potentially with higher performance and lower power consumption. Software executing on the fixed hardware in Von-Neuman architectures faces limitations on improving this ratio, since processors require extensive supporting logic to fetch and decode instructions while employing arithmetic units with statically defined precision. This dissertation explores novel approaches to improve computing architectures for linear system applications not only by designing application-specific hardware but also by optimizing precision by applying adaptive dynamic precision iterative refinement (AIR). This dissertation shows that AIR is numerically stable and well behaved. Theoretically, AIR can produce up to 3 times speedup over mixed precision iterative refinement on FPGAs. Implementing an AIR prototype for the refinement procedure on a Xilinx XC6VSX475T FPGA results in an estimated around 0.5, 8, and 2 times improvement for the time-, clock-, and energy-based performance per iteration compared to mixed precision iterative refinement on the Nvidia Tesla C2075 GPU, when a user requires a prescribed accuracy between single and double precision. AIR using FPGAs can produce beyond double precision accuracy effectively, while CPUs or GPUs need software help causing substantial overhead

University of Tennessee, Knoxville: Trace

CiteSeerX