4 research outputs found
Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation
The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip
(MPSoC) with embedded Programmable Logic (PL) that excels in many Edge (e.g.,
automotive or avionics) and Cloud (e.g., data centres) terrestrial
applications. However, it incorporates a large amount of SRAM cells, making the
device vulnerable to Neutron-induced Single Event Upsets (NSEUs) or otherwise
soft errors. Semiconductor vendors incorporate soft error mitigation mechanisms
to recover memory upsets (i.e., faults) before they propagate to the
application output and become an error. But how effective are the MPSoC's
mitigation schemes? Can they effectively recover upsets in high altitude or
large scale applications under different workloads? This article answers the
above research questions through a solid study that entails accelerated neutron
radiation testing and dependability analysis. We test the device on a broad
range of workloads, like multi-threaded software used for pose estimation and
weather prediction or a software/hardware (SW/HW) co-design image
classification application running on the AMD Deep Learning Processing Unit
(DPU). Assuming a one-node MPSoC system in New York City (NYC) at 40k feet, all
tested software applications achieve a Mean Time To Failure (MTTF) greater than
148 months, which shows that upsets are effectively recovered in the processing
system of the MPSoC. However, the SW/HW co-design (i.e., DPU) in the same
one-node system at 40k feet has an MTTF = 4 months due to the high failure rate
of its PL accelerator, which emphasises that some MPSoC workloads may require
additional NSEU mitigation schemes. Nevertheless, we show that the MTTF of the
DPU can increase to 87 months without any overhead if one disregards the
failure rate of tolerable errors since they do not affect the correctness of
the classification output.Comment: This manuscript is under review at IEEE Transactions on Reliabilit
Analyzing the Resilience to SEUs of an Image Data Compression Core in a COTS SRAM FPGA
In this paper, we evaluate the error resilience of an image data
compression IP core, an FPGA-based accelerator of the CCSDS 121.0-B-2
algorithm used to compress the ESA PROBA-3 ASPIICS Coronagraph System
Payload image data. We have enhanced a fault injection platform
previously proposed for the SEU evaluation of FPGA soft processor cores
to interface with the target image data compression IP core and
calculate the required for failure analysis image quality metrics.
Through an extensive fault injection campaign, we analyze the
vulnerability of the image compression core against Single Event Upsets
(SEU) in a SRAM FPGA configuration memory. The soft errors are
classified and evaluated depending on their effects in the operation of
the compression core and the quality of the reconstructed images based
on the structural similarity index metric (SSIM). The experimental fault
injection results demonstrate error resiliency inherent to the image
compression algorithm implementation that can be exploited to tradeoff
an acceptable lossless compression performance degradation or a
negligible effect on compression fidelity for significant savings in
FPGA resource utilization (23% LUTs and 17% FFs) using a selective
protection of the compression core modules
Configuration Memory Scrubbing of SRAM-Based FPGAs Using a Mixed 2-D Coding Technique
SRAM-based field-programmable gate array (FPGA) vendors typically integrate error correction codes (ECCs) into the configuration memory to assist designers in implementing scrubbing mechanisms. In most cases, these ECC schemes guarantee the correction of single- and double-bit errors per configuration frame but fail to correct upsets with higher multiplicity in a single frame caused by a single event. This phenomenon has been observed in modern commercial-off-the-shelf FPGAs. Bit interleaving schemes are used in some FPGA families to scatter the multiple upsets into more than one frame, but this does not fully resolve the problem of uncorrectable errors. In this article, we propose a configuration memory scrubbing approach for SRAM-based FPGA devices, which combines the embedded ECC logic with an interframe, interleaved parity code to build a mixed 2-D coding technique. The proposed technique improves the multiple-bit error correction capabilities of the on-chip ECC scheme while keeping the error correction latency and hardware cost low. The scrubbing concept has been validated under heavy-ion irradiation, where it succeeded in correcting all the single and multiple upsets observed during the radiation experiment
Single Event Effects Characterization of the Programmable Logic of Xilinx Zynq-7000 FPGA Using Very/Ultra High-Energy Heavy Ions
This article studies the impact of radiation-induced single-event effects (SEEs) in the Zynq-7000 field programmable gate array (FPGA) and presents an in-depth analysis of the SEE susceptibility of all the memories of the programmable logic. The radiation experiments were performed in the CERN North Area facility and in the GSI Helmholtz Centre for Heavy Ion Research using very/ultra high-energy heavy ions. The offline analysis of the radiation experimental results produced a deep understanding for various SEE phenomena observed in the Zynq-7000 FPGAs, such as single-event function interrupts (SEFIs), single-event transient (SET) in global signals, and multiple bit upsets that could be key issues for the design of an effective SEE mitigation approach