Search CORE

107 research outputs found

Hard and Soft Error Resilience for One-sided Dense Linear Algebra Algorithms

Author: Du Peng
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2012
Field of study

Dense matrix factorizations, such as LU, Cholesky and QR, are widely used by scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This dissertation develops fault tolerance algorithms for one-sided dense matrix factorizations, which handles Both hard and soft errors. For hard errors, we propose methods based on diskless checkpointing and Algorithm Based Fault Tolerance (ABFT) to provide full matrix protection, including the left and right factor that are normally seen in dense matrix factorizations. A horizontal parallel diskless checkpointing scheme is devised to maintain the checkpoint data with scalable performance and low space overhead, while the ABFT checksum that is generated before the factorization constantly updates itself by the factorization operations to protect the right factor. In addition, without an available fault tolerant MPI supporting environment, we have also integrated the Checkpoint-on-Failure(CoF) mechanism into one-sided dense linear operations such as QR factorization to recover the running stack of the failed MPI process. Soft error is more challenging because of the silent data corruption, which leads to a large area of erroneous data due to error propagation. Full matrix protection is developed where the left factor is protected by column-wise local diskless checkpointing, and the right factor is protected by a combination of a floating point weighted checksum scheme and soft error modeling technique. To allow practical use on large scale system, we have also developed a complexity reduction scheme such that correct computing results can be recovered with low performance overhead. Experiment results on large scale cluster system and multicore+GPGPU hybrid system have confirmed that our hard and soft error fault tolerance algorithms exhibit the expected error correcting capability, low space and performance overhead and compatibility with double precision floating point operation

University of Tennessee, Knoxville: Trace

Analysis and Design of Communication Avoiding Algorithms for Out of Memory(OOM) SVD

Author: Kabir Khairul
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2017
Field of study

Many applications — including big data analytics, information retrieval, gene expression analysis, and numerical weather prediction – require the solution of large, dense singular value decomposition (SVD). The size of matrices used in many of these applications is becoming too large to fit into into a computer’s main memory at one time, and the traditional SVD algorithms that require all the matrix components to be loaded into memory before computation starts cannot be used directly. Moving data (communication) between levels of memory hierarchy and the disk exposes extra challenges to design SVD for such big matrices because of the exponential growth in the gap between floating-point arithmetic rate and bandwidth for many different storage devices on modern high performance computers. In this dissertation, we have analyzed communication overhead on hierarchical memory systems and disks for SVD algorithms and designed communication-avoiding (CA) Out of Memory (OOM) SVD algorithms. By Out of Memory we mean that the matrix is too big to fit in the main memory and therefore must reside in external or internal storage. We have studied communication overhead for classical one-stage blocked SVD and two-stage tiled SVD algorithms and proposed our OOM SVD algorithm, which reduces the communication cost. We have presented theoretical analysis and strategies to design CA OOM SVD algorithms, developed optimized implementation of CA OOM SVD for multicore architecture, and presented its performance results. When matrices are tall, performance of OOM SVD can be improved significantly by carrying out QR decomposition on the original matrix in the first place. The upper triangular matrix generated by QR decomposition may fit in the main memory, and in-core SVD can be used efficiently. Even if the upper triangular matrix does not fit in the main memory, OOM SVD will work on a smaller matrix. That is why we have analyzed communication reduction for OOM QR algorithm, implemented optimized OOM tiled QR for multicore systems and showed performance improvement of OOM SVD algorithms for tall matrices

University of Tennessee, Knoxville: Trace

Designing LU-QR hybrid solvers for performance and stability

Author: Bradley Lowery
Jack Dongarra
Julien Herrmann
Julien Langou
Mathieu Faverge
Yves Robert
Publication venue
Publication date: 21/01/2014
Field of study

Abstract—This paper introduces hybrid LU-QR algorithms for solving dense linear systems of the form Ax = b. Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and QR elimination steps, based upon some robustness criterion. LU elimination steps can be very efficiently parallelized, and are twice as cheap in terms of floatingpoint operations, as QR steps. However, LU steps are not necessarily stable, while QR steps are always stable. The hybrid algorithms execute a QR step when a robustness criterion detects some risk for instability, and they execute an LU step otherwise. Ideally, the choice between LU and QR steps must have a small computational overhead and must provide a satisfactory level of stability with as few QR steps as possible. In this paper, we introduce several robustness criteria and we establish upper bounds on the growth factor of the norm of the updated matrix incurred by each of these criteria. In addition, we describe the implementation of the hybrid algorithms through an extension of the PaRSEC software to allow for dynamic choices during execution. Finally, we analyze both stability and performance results compared to state-of-the-art linear solvers on parallel distributed multicore platforms. I

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

An improved parallel singular value algorithm and its implementation for multicore hardware

Author: Agullo E.
Anderson E.
Andrews H. C.
Bartels R. H.
Berry B. P. M.
Bečka M.
Bregman N.
Buttari A.
Buttari A.
Dongarra J.
Farra V.
Gansterer W.
Golub G. H.
Golub G. H.
Haidar A.
Haidar A.
Jiang E. P.
Kurzak J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Preconditioning for standard and two-sided Krylov subspace methods

Author: Byckling Mikko
Publication venue: Aalto-yliopiston teknillinen korkeakoulu
Publication date: 01/01/2010
Field of study

This thesis is concerned with the solution of large nonsymmetric sparse linear systems. The main focus is on iterative solution methods and preconditioning. Assuming the linear system has a special structure, a minimal residual method called TSMRES, based on a generalization of a Krylov subspace, is presented and its convergence properties studied. In numerical experiments it is shown that there are cases where the convergence speed of TSMRES is faster than that of GMRES and vice versa. The numerical implementation of TSMRES is studied and a new numerically stable formulation is presented. In addition it is shown that preconditioning general linear systems for TSMRES by splittings is feasible in some cases. The direct solution of sparse linear systems of the Hessenberg type is also studied. Finally, a new approach to compute a factorized approximate inverse of a matrix suitable for preconditioning is presented

Aaltodoc Publication Archive

Proceedings of the 3rd Annual Conference on Aerospace Computational Control, volume 1

Author: Bernard Douglas E.
Man Guy K.
Publication venue
Publication date
Field of study

Conference topics included definition of tool requirements, advanced multibody component representation descriptions, model reduction, parallel computation, real time simulation, control design and analysis software, user interface issues, testing and verification, and applications to spacecraft, robotics, and aircraft

NASA Technical Reports Server

Full State History Cooperative Localisation with Complete Information Sharing

Author: Toohey Lachlan
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2017
Field of study

This thesis presents a decentralised localisation method for multiple robots. We enable reduced bandwidth requirements whilst using local solutions that fuse information from other robots. This method does not specify a communication topology or require complex tracking of information. The methods for including shared data match standard elements of nonlinear optimisation algorithms. There are four contributions in this thesis. The first is a method to split the multiple vehicle problem into sections that can be iteratively transmitted in packets with bandwidth bounds. This is done through delayed elimination of external states, which are states involved in intervehicle observations. Observations are placed in subgraphs that accumulate between external states. Internal states, which are all states not involved in intervehicle observations, can then be eliminated from each subgraph and the joint probability of the start and end states is shared between vehicles and combined to yield the solution to the entire graph. The second contribution is usage of variable reordering within these packets to enable handling of delayed observations that target an existing state such as with visual loop closures. We identify the calculations required to give the conditional probability of the delayed historical state on the existing external states before and after. This reduces the recalculation to updating the factorisation of a single subgraph and is independent of the time since the observation was made. The third contribution is a method and conditions for insertion of states into existing packets that does not invalidate previously transmitted data. We derive the conditions that enable this method and our fourth contribution is two motion models that conform to the conditions. Together this permits handling of the general out of sequence case

Sydney eScholarship

The numerical solution of sparse matrix equations by fast methods and associated computational techniques

Author: Samuel O. Okolie (7170197)
Publication venue
Publication date: 01/01/1978
Field of study

The numerical solution of sparse matrix equations by fast methods and associated computational technique

Loughborough University Institutional Repository