563 research outputs found

    Fault Injection for Embedded Microprocessor-based Systems

    Get PDF
    Microprocessor-based embedded systems are increasingly used to control safety-critical systems (e.g., air and railway traffic control, nuclear plant control, aircraft and car control). In this case, fault tolerance mechanisms are introduced at the hardware and software level. Debugging and verifying the correct design and implementation of these mechanisms ask for effective environments, and Fault Injection represents a viable solution for their implementation. In this paper we present a Fault Injection environment, named FlexFI, suitable to assess the correctness of the design and implementation of the hardware and software mechanisms existing in embedded microprocessor-based systems, and to compute the fault coverage they provide. The paper describes and analyzes different solutions for implementing the most critical modules, which differ in terms of cost, speed, and intrusiveness in the original system behavio

    Maximizing the Switching Activity of Different Modules Within a Processor Core via Evolutionary Techniques

    Get PDF
    One key aspect to be considered during device testing is the minimization of the switching activity of the circuit under test (CUT), thus avoiding possible problems stemming from overheating it. But there are also scenarios, where the maximization of certain circuits' modules switching activity could be proven useful (e.g., during Burn-In) in order to exercise the circuit under extreme operating conditions in terms of temperature (and temperature gradients). Resorting to a functional approach based on Software-based Self-test guarantees that the high induced activity cannot damage the CUT nor produce any yield loss. However, the generation of effective suitable test programs remains a challenging task. In this paper, we consider a scenario where the modules to be stressed are sub-modules of a fully pipelined processor. We present a technique, based on an evolutionary approach, able to automatically generate stress test programs, i.e., sequences of instructions achieving a high toggling activity in the target module. With respect to previous approaches, the generated sequences are short and repeatable, thus guaranteeing their easy usability to stress a module (and increase its temperature). The processor we used for our experiments is the Open RISC 1200. Results demonstrate that the proposed method is effective in achieving a high value of sustained toggling activity with short (3 instructions) and repeatable sequences

    A Fault Injection Environment for Microprocessor-based Board

    Get PDF
    Evaluating the faulty behaviour of low-cost microprocessor-based boards is an increasingly important issue, due to their usage in many safety critical systems. To address this issue, the paper describes a software-implemented fault injection system based on the trace exception mode available in most microprocessors. The architecture of the complete fault injection environment is proposed, integrating modules for generating a fault list, for performing their injection and for gathering the results, respectively. Data gathered from some sample benchmark applications are presented The main advantages of the approach are low cost, good portability, and high efficienc

    On the testing of special memories in GPGPUs

    Get PDF
    Nowadays, data-intensive processing applications, such as multimedia, high-performance computing and safety-critical ones (e.g., in automotive) employ General Purpose Graphics Processing Units (GPGPUs) due to their parallel processing capabilities and high performance. In these devices, multiple levels of memories are employed in GPGPUs to hide latency and increase the performance during the operation of a kernel. Moreover, modern GPGPU architectures implement cutting-edge semiconductor technologies, reducing their size and power consumption. However, some studies proved that these technologies are prone to faults during the operative life of a device, so compromising reliability. In this work, we developed functional test techniques based on parallel Software-Based Self-Test routines to test memory structures in the memory hierarchy of a GPGPU (FlexGripPlus) implementing the G80 architecture of Nvidia

    An Error-Detection and Self-Repairing Method for Dynamically and Partially Reconfigurable Systems

    Get PDF
    Reconfigurable systems are gaining an increasing interest in the domain of safety-critical applications, for example in the space and avionic domains. In fact, the capability of reconfiguring the system during run-time execution and the high computational power of modern Field Programmable Gate Arrays (FPGAs) make these devices suitable for intensive data processing tasks. Moreover, such systems must also guarantee the abilities of self-awareness, self-diagnosis and self-repair in order to cope with errors due to the harsh conditions typically existing in some environments. In this paper we propose a selfrepairing method for partially and dynamically reconfigurable systems applied at a fine-grain granularity level. Our method is able to detect, correct and recover errors using the run-time capabilities offered by modern SRAM-based FPGAs. Fault injection campaigns have been executed on a dynamically reconfigurable system embedding a number of benchmark circuits. Experimental results demonstrate that our method achieves full detection of single and multiple errors, while significantly improving the system availability with respect to traditional error detection and correction methods

    Evaluating the Impact of Transition Delay Faults in GPUs

    Get PDF
    This work proposes a method to evaluate the effects of transition delay faults (TDFs) in GPUs. The method takes advantage of low-level (i.e., RT- and gate-level) descriptions of a GPU to evaluate the effects of transition delay faults in GPUs, thus paving the way to model them as errors at the instruction level, which can contribute to the resilience evaluations of large and complex applications. For this purpose, the paper describes a setup that efficiently simulates transition delay faults. The results allow us to compare their effects with stuck-at-faults (SAFs) and perform an error classification correlating these faults as instruction-level errors. We resort to an open-source model of a GPU (FlexGripPlus) and a set of workloads for the evaluation. The experimental results show that, according to the application code style, TDFs can compromise the operation of an application from 1.3 to 11.63 times less than SAFs. Moreover, for all the analyzed applications, a considerable percentage of sites of the Integer (5.4% to 51.7%), Floating-point (0.9% to 2.4%), and Special Function unit (17.0% to 35.6%) can become critical if affected by a SAF or TDF. Finally, a correlation between the fault's impact from both fault models and the instructions executed by the applications reveals that SAFs in the functional units are more prone (from 45.6% to 60.4%) to propagate errors at the software level for all units than TDFs (from 17.9% to 58.8%)

    RESCUE: Cross-Sectoral PhD Training Concept for Interdependent Reliability, Security and Quality

    Get PDF
    The recently started European Training Network (ETN) RESCUE advances scientific competences in the demanding and mutually dependent aspects of nano-electronic systems design, i.e. reliability, security and quality, as well as related electronic design automation tools. Second, it provides early-stage researchers with innovative cross-sectoral training in the involved disciplines and beyond, preparing them to face today’s and future challenges in nano-electronics design. Furthermore, they are also trained to be innovative, creative, and more important – will have an entrepreneurial mentality. The latter will help to compile ideas into products and services for economic and social benefits and creates qualified workforce and knowledge for the industry. The consortium consists of leading European research groups competent to tackle the interdependent challenges in a holistic manner, and is excellently balanced in terms of academic and industrial training and research facilities

    Testing permanent faults in pipeline registers of GPGPUs: A multi-kernel approach

    Get PDF
    In the last decade, General Purpose Graphics Processing Units (GPGPUs) have been widely employed in high demanding data processing applications including multimedia and high-performance computing due to their parallel processing capabilities. Nowadays, these devices are considered as promising solutions also for high-performance safety-critical applications, such as autonomous and semi-autonomous vehicles. Current GPGPUs are designed targeting challenging execution requirements, e.g., related to performance and power constraints, forcing designers to use aggressive technology scaling solutions. Nevertheless, some implementation technologies are prone to introduce faults in the device during the operative life adding unaffordable effects and errors for the safety-critical domain. Hence, effective in-field test solutions are required to guarantee the target reliability levels. In this paper, we propose in-field test solutions based on Software-Based Self-Test (SBST) targeting the control-path of pipeline registers located in the Streaming Multiprocessor (SM) of a GPGPU. We resort to a multiple-kernel approach to detect permanent faults in these register fields. The solutions were designed employing NVIDIA CUDA, when possible, and lower level constructs elsewhere. Several usages and compilation restrictions are also described. Fault simulation results on an open-source VHDL GPGPU (FlexGrip) implementation of the G80 architecture of NVIDIA are reported, showing the effectiveness and limitations of the approach
    • …