Search CORE

2 research outputs found

About Performance Faults in Microprocessor Core in-field Testing

Author: Acle Julio Perez
Reorda Matteo Sonza
Sanchez Ernesto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A Multi-level Approach to Evaluate the Impact of GPU Permanent Faults on CNN's Reliability

Author: Dos Santos Fernando F.
Guerrero-Balaguera Juan-David
Rech Paolo
Reorda Matteo Sonza
Rodriguez Condia Josie E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Graphics processing units (GPUs) are widely used to accelerate Artificial Intelligence applications, such as those based on Convolutional Neural Networks (CNNs). Since in some domains in which CNNs are heavily employed (e.g., automotive and robotics) the expected lifetime of GPUs is over ten years, it is of paramount importance to study the impact of permanent faults (e.g. due to aging). Crucially, while the impact of transient faults on GPUs running CNNs has been widely studied, an accurate evaluation of the impact of permanent faults is still lacking. Performing this evaluation is challenging due to the complexity of GPU devices and the software implementing a CNN. In this work, we propose a methodology that combines the accuracy of gate-level fault simulation with the speed and flexibility of software fault injection to evaluate the effects of permanent hardware faults affecting a GPU. First, we profile the executed low-level GPU instructions during the CNN inference. Then, using extensive gate-level fault injection campaigns, we provide an accurate analysis of the effects of permanent faults on the internal modules executing the targeted instructions. Finally, we propagate these effects using fast software-based fault injection. The method allows, for the first time, to estimate the percentage of permanent faults leading the CNN to produce wrong results (i.e., changing the result of its work). The method's feasibility, which allows for flexibly trade-off accuracy with the required computational effort, is shown using LeNet running on an Ampere Nvidia GPU as a case study. The method reduces the computational effort for the evaluation by several orders of magnitude with respect to plain gate- and RTL-level faults simulation

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)