78 research outputs found
Performances VS Reliability: how to exploit Approximate Computing for Safety-Critical applications
International audienceApproximate Computing (AxC) paradigm aims at designing energy-efficient systems, saving computational resources , and presenting better execution times. AxC aims to selectively violate the specifications, trading accuracy off for efficiency. It has been demonstrated in the literature the effectiveness of imprecise computation for both software and hardware components implementing inexact algorithms, showing an inherent resiliency to errors. On the other hand, the hidden cost of AxC is the reduction on the inherent resiliency to errors of an application. This paper aims at analyzing the impact of AxC on the reliability
Evaluating Architectural, Redundancy, and Implementation Strategies for Radiation Hardening of FinFET Integrated Circuits
In this article, authors explore radiation hardening techniques through the design of a test chip implemented in 16-nm FinFET technology, along with architectural and redundancy design space exploration of its modules. Nine variants of matrix multiplication were taped out and irradiated with neutrons. The results obtained from the neutron campaign revealed that the radiation-hardened variants present superior resiliency when either local or global triple modular redundancy (TMR) schemes are employed. Furthermore, simulation-based fault injection was utilized to validate the measurements and to explore the effects of different implementation strategies on failure rates. We further show that the interplay between these different implementation strategies is not trivial to capture and that synthesis optimizations can effectively break assumptions about the effectiveness of redundancy schemes
Application-Based Analysis of Register File Criticality for Reliability Assessment in Embedded Microprocessors
There is an increasing concern to reduce the cost and overheads during the development of reliable systems. Selective protection of most critical parts of the systems represents a viable solution to obtain a high level of reliability at a fraction of the cost. In particular to design a selective fault mitigation strategy for processor-based systems, it is mandatory to identify and prioritize the most vulnerable registers in the register file as best candidates to be protected (hardened). This paper presents an application-based metric to estimate the criticality of each register from the microprocessor register file in microprocessor-based systems. The proposed metric relies on the combination of three different criteria based on common features of executed applications. The applicability and accuracy of our proposal have been evaluated in a set of applications running in different microprocessors. Results show a significant improvement in accuracy compared to previous approaches and regardless of the underlying architecture.This work was funded in part by the Spanish Ministry of Education, Culture and Sports with the project “Developing hybrid fault tolerance techniques for embedded microprocessors” (PHB2012-0158-PC)
S-SETA: Selective Software-Only Error-Detection Technique Using Assertions
Software-based techniques offer several advantages to increase the reliability of processor-based systems at very low cost, but they cause performance degradation and an increase of the code size. To meet constraints in performance and memory, we propose SETA, a new control-flow software-only technique that uses assertions to detect errors affecting the program flow. SETA is an independent technique, but it was conceived to work together with previously proposed data-flow techniques that aim at reducing performance and memory overheads. Thus, SETA is combined with such data-flow techniques and submitted to a fault injection campaign. Simulation and neutron induced SEE tests show high fault coverage at performance and memory overheads inferior to the state-of-the-art.This work was supported in part by CNPq and CAPES, Brazilian agencies
Desenvolvimento de técnicas de tolerância à falhas para componentes programáveis por SRAM
Este artigo discute técnicas de tolerância à falhas para componentes programáveis, conhecidos por FPGAs (Field Programmable Cate Arrays). Essas técnicas baseiam-se em modificações a nível de circuito lógico implementadas em descrição de alto nível, sem modificação na arquitetura do FPGA. O método baseado em descrição de alto nível utiliza redundância tripla de módulos (TMR) e a combinação entre redundância dupla de módulos (DMR) com detecção de erros concorrentes (CED), que pode lidar com falhas na parte lógica combinacional e seqüencial. Os métodos foram validados por experimentos ele injeção de falhas emulados em uma placa de prototipação. Os resultados foram analisados em termos de confiabilidade, número de pinos de entrada e saída, área e desempenho.This paper discusses fault-tolerant techniques for programmable devices, the well-know FPGAs (Field Programmable Gate Arrays). These techniques can be based on circuit level modifications, implemented at the high-level description, without modification in the FPGA architecture. The high-level method is based on Triple Modular Redundancy (TMR) and a combination of Duplication Modular Redundancy (DMR) with Concurrent Error Detection (CED) techniques, which are able to cope with upsets in the combinational and in the sequential logic. The methodology was validated by fault injection experiments in an emulation board. Results have been analyzed in terms of reliability, input and output pin count, area and performance
Testing a fault tolerant mixed-signal design under TID and heavy ions
This work presents results of three distinctradiation tests performed upon a fault tolerant data acqui-sition system comprising a design diversity redundancytechnique. The first and second experiments are Total Ion-izing Dose (TID) essays, comprising gamma and X-rayirradiations. The last experiment considers single eventeffects, in which two heavy ion irradiation campaignsare carried out. The case study system comprises threeanalog-to-digital converters and two software-based vot-ers, besides additional software and hardware resourcesused for controlling, monitoring and memory manage-ment. The applied Diversity Triple Modular Redundancy(DTMR) technique, comprises different levels of diversity(temporal and architectural). The circuit was designed ina programmable System-on-Chip (PSoC), fabricated in a130nm CMOS technology process. Results show that thetechnique may increase the lifetime of the system underTID if comparing with a non-redundant implementation.Considering the heavy ions experiments the system wasproved effective to tolerate 100% of the observed errorsoriginated in the converters, while errors in the process-ing unit present a higher criticality. Critical errors occur-ring in one of the voters were also observed. A secondheavy ion campaign was then carried out to investigatethe voters reliability, comparing the the dynamic cross sec-tion of three different software-based voter schemes im-plemented in the considered PSoC
Reliability on ARM Processors Against Soft Errors Through SIHFT Techniques
ARM processors are leaders in embedded systems, delivering high-performance computing, power efficiency, and reduced cost. For this reason, there is a relevant interest for its use in the aerospace industry. However, the use of sub-micron technologies has increased the sensitivity to radiation-induced transient faults. Thus, the mitigation of soft errors has become a major concern. Software-Implemented Hardware Fault Tolerance (SIHFT) techniques are a low-cost way to protect processors against soft errors. On the other hand, they cause high overheads in the execution time and memory, which consequently increase the energy consumption. In this work, we implement a set of software techniques based on different redundancy and checking rules. Furthermore, a low-overhead technique to protect the program execution flow is included. Tests are performed using the ARM Cortex-A9 processor. Simulated fault injection campaigns and radiation test with heavy ions have been performed. Results evaluate the trade-offs among fault detection, execution time, and memory footprint. They show significant improvements of the overheads when compared to previously reported techniques.This work was supported in part by CNPq and CAPES, Brazilian agencies
FPGAs and parallel architectures for aerospace applications: soft errors and fault-tolerant design
This book introduces the concepts of soft errors in FPGAs, as well as the motivation for using commercial, off-the-shelf (COTS) FPGAs in mission-critical and remote applications, such as aerospace. The authors describe the effects of radiation in FPGAs, present a large set of soft-error mitigation techniques that can be applied in these circuits, as well as methods for qualifying these circuits under radiation. Coverage includes radiation effects in FPGAs, fault-tolerant techniques for FPGAs, use of COTS FPGAs in aerospace applications, experimental data of FPGAs under radiation, FPGA embedded processors under radiation, and fault injection in FPGAs. Since dedicated parallel processing architectures such as GPUs have become more desirable in aerospace applications due to high computational power, GPU analysis under radiation is also discussed. · Discusses features and drawbacks of reconfigurability methods for FPGAs, focused on aerospace applications; · Explains how radiation from space causes soft errors in FPGAs and how to mitigate them; · Enables readers to qualify the target application on FPGA under radiation and by fault injection
Desenvolvimento de técnicas de tolerância a falhas transientes em componentes programáveis por SRAM
This thesis presents the study and development of fault-tolerant techniques for programmable architectures, the well-known Field Programmable Gate Arrays (FPGAs), customizable by SRAM. FPGAs are becoming more valuable for space applications because of the high density, high performance, reduced development cost and re-programmability. In particular, SRAM-based FPGAs are very valuable for remote missions because of the possibility of being reprogrammed by the user as many times as necessary in a very short period. SRAM-based FPGA and micro-controllers represent a wide range of components in space applications, and as a result will be the focus of this work, more specifically the Virtex® family from Xilinx and the architecture of the 8051 micro-controller from Intel. The Triple Modular Redundancy (TMR) with voters is a common high-level technique to protect ASICs against single event upset (SEU) and it can also be applied to FPGAs. The TMR technique was first tested in the Virtex® FPGA architecture by using a small design based on counters. Faults were injected in all sensitive parts of the FPGA and a detailed analysis of the effect of a fault in a TMR design synthesized in the Virtex® platform was performed. Results from fault injection and from a radiation ground test facility showed the efficiency of the TMR for the related case study circuit. Although TMR has showed a high reliability, this technique presents some limitations, such as area overhead, three times more input and output pins and, consequently, a significant increase in power dissipation. Aiming to reduce TMR costs and improve reliability, an innovative high-level technique for designing fault-tolerant systems in SRAM-based FPGAs was developed, without modification in the FPGA architecture. This technique combines time and hardware redundancy to reduce overhead and to ensure reliability. It is based on duplication with comparison and concurrent error detection. The new technique proposed in this work was specifically developed for FPGAs to cope with transient faults in the user combinational and sequential logic, while also reducing pin count, area and power dissipation. The methodology was validated by fault injection experiments in an emulation board. The thesis presents comparison results in fault coverage, area and performance between the discussed techniques.Esse trabalho consiste no estudo e desenvolvimento de técnicas de proteção a falhas transientes, também chamadas single event upset (SEU), em circuitos programáveis customizáveis por células SRAM. Os projetistas de circuitos eletrônicos estão cada vez mais predispostos a utilizar circuitos programáveis, conhecidos como Field Programmable Gate Array (FPGA), para aplicações espaciais devido a sua alta flexibilidade lógica, alto desempenho, baixo custo no desenvolvimento, rapidez na prototipação e principalmente pela reconfigurabilidade. Em particular, FPGAs customizados por SRAM são muito importantes para missões espaciais pois podem ser rapidamente reprogramados à distância quantas vezes for necessário. A técnica de proteção baseada em redundância tripla, conhecida como TMR, é comumente utilizada em circuitos integrados de aplicações específicas e pode também ser aplicada em circuitos programáveis como FPGAs. A técnica TMR foi testada no FPGA Virtex® da Xilinx em aplicações como contadores e micro-controladores. Falhas foram injetadas em todos as partes sensíveis da arquitetura e seus efeitos foram detalhadamente analisados. Os resultados de injeção de falhas e dos experimentos sob radiação em laboratório comprovaram a eficácia do TMR em proteger circuitos sintetizados em FPGAs customizados por SRAM. Todavia, essa técnica possui algumas limitações como aumento em área, uso de três vezes mais pinos de entrada e saída (E/S) e conseqüentemente, aumento na dissipação de potência. Com o objetivo de reduzir custos no TMR e melhorar a confiabilidade, uma técnica inovadora de tolerância a falhas para FPGAs customizados por SRAM foi desenvolvida para ser implementada em alto nível, sem modificações na arquitetura do componente. Essa técnica combina redundância espacial e temporal para reduzir custos e assegurar confiabilidade. Ela é baseada em duplicação com um circuito comparador e um bloco de detecção concorrente de falhas. Esta nova técnica proposta neste trabalho foi especificamente projetada para tratar o efeito de falhas transientes em blocos combinacionais e seqüenciais na arquitetura reconfigurável, reduzir o uso de pinos de E/S, área e dissipação de potência. A metodologia foi validada por injeção de falhas emuladas em uma placa de prototipação. O trabalho mostra uma comparação nos resultados de cobertura de falhas, área e desempenho entre as técnicas apresentadas
- …