41 research outputs found

    TFI-FTS: An efficient transient fault injection and fault-tolerant system for asynchronous circuits on FPGA platform

    Get PDF
    Designing VLSI digital circuits is challenging tasks because of testing the circuits concerning design time. The reliability and productivity of digital integrated circuits are primarily affected by the defects in the manufacturing process or systems. If the defects are more in the systems, which leads the fault in the systems. The fault tolerant systems are necessary to overcome the faults in the VLSI digital circuits. In this research article, an asynchronous circuits based an effective transient fault injection (TFI) and fault tolerant system (FTS) are modelled. The TFI system generates the faults based on BMA based LFSR with faulty logic insertion and one hot encoded register. The BMA based LFSR reduces the hardware complexity with less power consumption on-chip than standard LFSR method. The FTS uses triple mode redundancy (TMR) based majority voter logic (MVL) to tolerant the faults for asynchronous circuits. The benchmarked 74X-series circuits are considered as an asynchronous circuit for TMR logic. The TFI-FTS module is modeled using Verilog-HDL on Xilinx-ISE and synthesized on hardware platform. The Performance parameters are tabulated for TFI-FTS based asynchronous circuits. The performance of TFI-FTS Module is analyzed with 100% fault coverage. The fault coverage is validated using functional simulation of each asynchronous circuit with fault injection in TFI-FTS Module

    A Survey of Fault-Injection Methodologies for Soft Error Rate Modeling in Systems-on-Chips

    Get PDF
    The development of process technology has increased system performance, but the system failure probability has also significantly increased. It is important to consider the system reliability in addition to the cost, performance, and power consumption. In this paper, we describe the types of faults that occur in a system and where these faults originate. Then, fault-injection techniques, which are used to characterize the fault rate of a system-on-chip (SoC), are investigated to provide a guideline to SoC designers for the realization of resilient SoCs

    An extended model to support detailed GPGPU reliability analysis

    Get PDF
    General Purpose Graphics Processing Units (GPGPUs) have been used in the last decades as accelerators in high demanding data processing applications, such as multimedia processing and high-performance computing. Nowadays, these devices are becoming popular even in safety-critical applications, such as autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of transient faults, such as those produced by radiation effects. These effects can be represented in the system as Single Event Upsets (SEUs) and are able to generate intolerable application misbehaviors in safety critical environments. In this work, we extended the capabilities of an open-source VHDL GPGPU model (FlexGrip) in order to study and analyze in a much more detailed manner the effects of SEUs in some critical modules within a GPGPU. Simulation results showed that scheduler controller has different levels of SEU sensibility depending on the affected location. Moreover, a reduced number of execution units, in the GPGPU can decrease the system reliability

    Testing permanent faults in pipeline registers of GPGPUs: A multi-kernel approach

    Get PDF
    In the last decade, General Purpose Graphics Processing Units (GPGPUs) have been widely employed in high demanding data processing applications including multimedia and high-performance computing due to their parallel processing capabilities. Nowadays, these devices are considered as promising solutions also for high-performance safety-critical applications, such as autonomous and semi-autonomous vehicles. Current GPGPUs are designed targeting challenging execution requirements, e.g., related to performance and power constraints, forcing designers to use aggressive technology scaling solutions. Nevertheless, some implementation technologies are prone to introduce faults in the device during the operative life adding unaffordable effects and errors for the safety-critical domain. Hence, effective in-field test solutions are required to guarantee the target reliability levels. In this paper, we propose in-field test solutions based on Software-Based Self-Test (SBST) targeting the control-path of pipeline registers located in the Streaming Multiprocessor (SM) of a GPGPU. We resort to a multiple-kernel approach to detect permanent faults in these register fields. The solutions were designed employing NVIDIA CUDA, when possible, and lower level constructs elsewhere. Several usages and compilation restrictions are also described. Fault simulation results on an open-source VHDL GPGPU (FlexGrip) implementation of the G80 architecture of NVIDIA are reported, showing the effectiveness and limitations of the approach

    Sensitivity Analysis of a Neural Network based Avionic System by Simulated Fault and Noise Injection

    Get PDF
    The application of virtual sensor is widely discussed in literature as a cost effective solution compared to classical physical architectures. RAMS (Reliability, Availability, Maintainability and Safety) performance of the entire avionic system seem to be greatly improved using analytical redundancy. However, commercial applications are still uncommon. A complete analysis of the behavior of these models must be conducted before implementing them as an effective alternative for aircraft sensors. In this paper, a virtual sensor based on neural network called Smart-ADAHRS (Smart Air Data, Attitude and Heading Reference System) is analyzed through simulation. The model simulates realistic input signals of typical inertial and air data MEMS (Micro Electro-Mechanical Systems) sensors. A procedure to define the background noise model is applied and two different cases are shown. The first considers only the sensor noise whereas the latter uses the same procedure with the operative flight noise. Flight tests have been conducted to measure the disturbances on the inertial and air data sensors. Comparison of the Power Spectral Density function is carried out between operative and background noise. A model for GNSS (Global Navigation Satellite System) receiver, complete with constellation simulator and atmospheric delay evaluation, is also implemented. Eventually, a simple multi-sensor data fusion technique is modeled. Results show good robustness of the Smart-ADAHRS to the sensor faults and a marginal sensitivity to the temperature-related faults. Solution for this kind of degradation is indicated at the end of the paper. Influences of noise on input signals is also discussed

    Ground-truth prediction to accelerate soft-error impact analysis for iterative methods

    Get PDF
    Understanding the impact of soft errors on applications can be expensive. Often, it requires an extensive error injection campaign involving numerous runs of the full application in the presence of errors. In this paper, we present a novel approach to arriving at the ground truth-the true impact of an error on the final output-for iterative methods by observing a small number of iterations to learn deviations between normal and error-impacted execution. We develop a machine learning based predictor for three iterative methods to generate ground-truth results without running them to completion for every error injected. We demonstrate that this approach achieves greater accuracy than alternative prediction strategies, including three existing soft error detection strategies. We demonstrate the effectiveness of the ground truth prediction model in evaluating vulnerability and the effectiveness of soft error detection strategies in the context of iterative methods.This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number 66905, program manager Lucy Nowell. Pacific Northwest National Laboratory is operated by Battelle for DOE under Contract DE-AC05-76RL01830.Peer ReviewedPostprint (author's final draft

    Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions

    Get PDF
    Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce the costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve the system dependability to hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This paper surveys the current state of the art of fault tolerance approaches when used in the context real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur the future research activities
    corecore