247 research outputs found
On the Analysis of Radiation-induced Failures in the AXI Interconnect Module
Due to the increasing demand for high performance in embedded systems, devices such as SRAM-based programmable devices are becoming an appealing solution to reach high performance with limited costs. However, SRAM-based programmable devices are subjected to various sources of radiation-induced faults that affect their reliability, such as ionizing radiation and particles, even at sea-level. In this paper, we evaluate the reliability of the interconnection module, implemented on the programmable hardware, against radiation-induced faults in the configuration layer. To do so, we performed a fault injection campaign in order to emulate the radiation-induced effects impacting the configuration memory of AP-SoC Zynq 7000, specifically targeting the configuration memory section programming the interconnection module implemented on the programmable logic. This interconnection module is a crucial element for a wide range of applications and mitigation techniques such as hardware-accelerated designs, Dynamic Partial Reconfiguration, or Triple Modular Redundancy; especially if they are adopted to achieve high performance, high bandwidth and high reliability. The fault injection results have been analyzed and classified accordingly with the effect observed on the processor-system side in terms of availability and fault model affecting data computed by cores implemented on the programmable logic side
On the evaluation of SEU effects on AXI interconnect within AP-SoCs
G-Programmable System-on-Chips offering the union of a processor system with a programmable hardware gave rise to applications that choose hardware acceleration to offload and parallelize computationally demanding tasks. Due to flexibility and performance they provide at low cost, these devices are also appealing for several applications in avionics, aerospace and automotive sectors, where reliability is the main concern. In particular, the interconnection architecture, and especially the AXI Interconnection for FPGA-accelerated applications, plays a critical role in these systems. This paper presents a reliability analysis of the AXI Interconnect IP Core implemented on Zynq-7000 AP-SoC against SEUs in the configuration memory of the programmable logic. The analysis has been conducted performing a fault injection campaign on the specific section of the configuration memory implementing the IP Core under test, which has been implemented within a benchmark design. The results are analyzed and classified, highlighting the criticality of the AXI Interconnect IP Core as a point of failure, especially for SEU-hardened hardware accelerator relying on mitigation techniques based on fine-grained and coarse-grained replication
FireNN: Neural Networks Reliability Evaluation on Hybrid Platforms
The growth of neural networks complexity has led to adopt of hardware-accelerators to cope with the computational power required by the new architectures. The possibility to adapt the network for different platforms enhanced the interests of safety-critical applications. The reliability evaluation of neural networks are still premature and requires platforms to measure the safety standards required by mission-critical applications. For this reason, the interest in studying the reliability of neural networks is growing. We propose a new approach for evaluating the resiliency of neural networks by using hybrid platforms. The approach relies on the reconfigurable hardware for emulating the target hardware platform and performing the fault injection process. The main advantage of the proposed approach is to involve the on-hardware execution of the neural network in the reliability analysis without any intrusiveness into the network algorithm and addressing specific fault models. The implementation of FireNN, the platform based on the proposed approach, is described in the paper. Experimental analyses are performed using fault injection on AlexNet. The analyses are carried out using the FireNN platform and the results are compared with the outcome of traditional software-level evaluations. Results are discussed considering the insight into the hardware level achieved using FireNN
Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs
Multiprocessor system-on-chip such as embedded
GPUs are becoming very popular in safety-critical applications,
such as autonomous and semi-autonomous vehicles. However,
these devices can suffer from the effects of soft-errors, such as
those produced by radiation effects. These effects are able to
generate unpredictable misbehaviors. Fault tolerance oriented to
multi-threaded software introduces severe performance
degradations due to the redundancy, voting and correction
threads operations. In this paper, we propose a new fault injection
environment for NVIDIA GPGPU devices and a fault tolerance
approach based on error detection and correction threads
executed during data transfer operations on embedded GPUs. The
fault injection environment is capable of automatically injecting
faults into the instructions at SASS level by instrumenting the
CUDA binary executable file. The mitigation approach is based on
concurrent error detection threads running simultaneously with
the memory stream device to host data transfer operations. With
several benchmark applications, we evaluate the impact of softerrors classifying Silent Data Corruption, Detection,
Unrecoverable Error and Hang. Finally, the proposed mitigation
approach has been validated by soft-error fault injection
campaigns on an NVIDIA Pascal Architecture GPU controlled by
Quad-Core A57 ARM processor (JETSON TX2) demonstrating
an advantage of more than 37% with respect to state of the art
solution
FPGA Qualification and Failure Rate Estimation Methodology for LHC Environments Using Benchmarks Test Circuits
When studying the behavior of a field programmable gate array (FPGA) under radiation, the most commonly used methodology consists in evaluating the single-event effect (SEE) cross section of its elements individually. However, this method does not allow the estimation of the device failure rate when using a custom design. An alternative approach based on benchmark circuits is presented in this article. It allows standardized application-level testing, which makes the comparison between different FPGAs easier. Moreover, it allows the evaluation of the FPGA failure rate independent of the application that will be implemented. The employed benchmark circuit belongs to the ITC’99 benchmark suite developed at Politecnico di Torino. Using the proposed methodology, the response of four FPGAs—the NG-Medium, the ProASIC3, the SmartFusion2, and the PolarFire—was evaluated under high-energy protons. Radiation tests with thermal neutrons were also conducted on the PolarFire to assess its potential sensitivity to them. Moreover, its performances in terms of total ionizing dose (TID) effects have been evaluated by measuring the degradation of the propagation delay during irradiatio
SRAM-Based FPGA Systems for Safety-Critical Applications: A Survey on Design Standards and Proposed Methodologies
As the ASIC design cost becomes affordable only for very large-scale productions, the FPGA technology is currently becoming the leading technology for those applications that require a small-scale production. FPGAs can be considered as a technology crossing between hardware and software. Only a small-number of standards for the design of safety-critical systems give guidelines and recommendations that take the peculiarities of the FPGA technology into consideration. The main contribution of this paper is an overview of the existing design standards that regulate the design and verification of FPGA-based systems in safety-critical application fields. Moreover, the paper proposes a survey of significant published research proposals and existing industrial guidelines about the topic, and collects and reports about some lessons learned from industrial and research projects involving the use of FPGA devices
- …