Search CORE

12 research outputs found

Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

Author: Dongarra Jack
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 15/10/2013
Field of study

The goal of the Extreme-scale Algorithms & Software Institute (EASI) is to close the �application-architecture performance gap� by exploring algorithms and runtime improvements that will enable key science applications to better exploit the architectural features of DOE extreme-scale systems. For the past year of the project, our efforts at the University of Tennessee have concentrated on, and made significant progress related to, the following high-level EASI goals: � Develop multi-precision and architecture-aware implementations of Krylov, Poisson, Helmholtz solvers, and dense factorizations for heterogeneous multi-core systems; � Explore new methods of algorithm resilience, and develop new algorithms with these capabilities; � Develop runtime support for adaptable algorithms that are dealing with resilience, scalability; � Distribute the new algorithms and runtime support through widely used software packages; � Establish a strong outreach program to disseminate results, interact with colleagues and train students and junior members of our community

Crossref

UNT Digital Library

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines

Author: Baboulin Marc
Donfack Simplice
Dongarra Jack
Grigori Laura
Rémy Adrien
Tomov Stanimire
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

We study several solvers for the solution of general linear systems where the main objective is to reduce the communication overhead due to pivoting. We first describe two existing algorithms for the LU factorization on hybrid CPU/GPU architectures. The first one is based on partial pivoting and the second uses a random preconditioning of the original matrix to avoid pivoting. Then we introduce a solver where the panel factorization is performed using a communication-avoiding pivoting heuristic while the update of the trailing submatrix is performed by the GPU. We provide performance comparisons for these solvers on current hybrid multicore-GPU parallel machines

HAL-CentraleSupelec

Elsevier - Publisher Connector

Crossref

HAL-Rennes 1