Search CORE

8,461 research outputs found

Cross-layer system reliability assessment framework for hardware faults

Author: Bosio Alberto
Canal Corretger Ramon
Chatzidimitriou Athanansios
Di Carlo Stefano
Di Natale Giorgio
Gizipoulos Dimitris
González Colás Antonio María
Kaliorakis Manolis
Kooli Maha
Politano Gianfranco
Riera Villanueva Marc
Savino Alessandro
Tselonis Sotiris
Vallero Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Author: Cristal Adrian
Salami Behzad
Unsal Osman
Publication venue
Publication date: 14/06/2018
Field of study

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute- and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate), NN layers, and NN activation functions, and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Cache Size Selection for Performance, Energy and Reliability of Time-Constrained Systems

Author: Al-Hashimi Bashir
Cai Yuan
Ejlali Alireza
Reddy Sudhakar
Schmitz Marcus
Publication venue
Publication date: 01/01/2006
Field of study

Southampton (e-Prints Soton)

Modeling of a latent fault detector in a digital system

Author: Nagel P. M.
Publication venue
Publication date
Field of study

Methods of modeling the detection time or latency period of a hardware fault in a digital system are proposed that explain how a computer detects faults in a computational mode. The objectives were to study how software reacts to a fault, to account for as many variables as possible affecting detection and to forecast a given program's detecting ability prior to computation. A series of experiments were conducted on a small emulated microprocessor with fault injection capability. Results indicate that the detecting capability of a program largely depends on the instruction subset used during computation and the frequency of its use and has little direct dependence on such variables as fault mode, number set, degree of branching and program length. A model is discussed which employs an analog with balls in an urn to explain the rate of which subsequent repetitions of an instruction or instruction set detect a given fault

NASA Technical Reports Server

Probabilistic Monte-Carlo method for modelling and prediction of electronics component life

Author: Alghassi Ali
Perinpanayagam Suresh
Sreenuch T.
Xie Ye
Publication venue: 'The Science and Information Organization'
Publication date: 31/01/2014
Field of study

Power electronics are widely used in electric vehicles, railway locomotive and new generation aircrafts. Reliability of these components directly affect the reliability and performance of these vehicular platforms. In recent years, several research work about reliability, failure mode and aging analysis have been extensively carried out. There is a need for an efficient algorithm able to predict the life of power electronics component. In this paper, a probabilistic Monte-Carlo framework is developed and applied to predict remaining useful life of a component. Probability distributions are used to model the component’s degradation process. The modelling parameters are learned using Maximum Likelihood Estimation. The prognostic is carried out by the mean of simulation in this paper. Monte-Carlo simulation is used to propagate multiple possible degradation paths based on the current health state of the component. The remaining useful life and confident bounds are calculated by estimating mean, median and percentile descriptive statistics of the simulated degradation paths. Results from different probabilistic models are compared and their prognostic performances are evaluated

Cranfield CERES

Autonomous fault emulation: a new FPGA-based acceleration system for hardness evaluation

Author: Entrena Arrontes Luis Alfonso
García Valderas Mario
López Ongil Celia
Portela García Marta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2007
Field of study

The appearance of nanometer technologies has produced a significant increase of integrated circuit sensitivity to radiation, making the occurrence of soft errors much more frequent, not only in applications working in harsh environments, like aerospace circuits, but also for applications working at the earth surface. Therefore, hardened circuits are currently demanded in many applications where fault tolerance was not a concern in the very near past. To this purpose, efficient hardness evaluation solutions are required to deal with the increasing size and complexity of modern VLSI circuits. In this paper, a very fast and cost effective solution for SEU sensitivity evaluation is presented. The proposed approach uses FPGA emulation in an autonomous manner to fully exploit the FPGA emulation speed. Three different techniques to implement it are proposed and analyzed. Experimental results show that the proposed Autonomous Emulation approach can reach execution rates higher than one million faults per second, providing a performance improvement of two orders of magnitude with respect to previous approaches. These rates give way to consider very large fault injection campaigns that were not possible in the past.This work was supported by the Directorate of Research of Madrid Community Government, Spain (Code 07/0052/2003 2) and by the European Commission and Spanish Government under MEDEA+ Project (PARACHUTE-2A701) and PROFIT Project (CIRCE-FIT-330100-2005-60)

Universidad Carlos III de Madrid e-Archivo