An improved fault mitigation strategy for CUDA Fermi GPUs

Di Carlo, S.; Gambardella, G.; Martella, I.; Prinetto, P.; Rolfo, D.; Trotta, P.

An improved fault mitigation strategy for CUDA Fermi GPUs

Authors: S. Di Carlo
G. Gambardella
I. Martella
P. Prinetto
D. Rolfo
P. Trotta
Publication date: 1 January 2014
Publisher

Abstract

High computation is a predominant requirement in many applications. In this field, Graphic Processing Units (GPUs) are more and more adopted. Low prices and high parallelism let GPUs be attractive, even in safety critical applications. Nonetheless, new methodologies must be studied and developed to increase the dependability of GPUs. This paper presents an improved fault mitigation strategy against permanent faults for CUDA Fermi GPUs. The proposed approach exploits the reverse engineering of the block scheduling policy in CUDA Fermi GPUs in order to minimize the fault mitigation timing overhead. The graceful performance degradation achieved by the proposed technique outperforms multithreaded CPU implementations and other fault mitigation strategies for CUDA GPU, even in presence of multiple permanent faults

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

PORTO Publications Open Repository TOrino

oai:porto.polito.it:2571949

Last time updated on 16/02/2017

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

oai:iris.polito.it:11583/25719...

Last time updated on 30/10/2019