3 research outputs found

    Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation

    Get PDF
    The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip (MPSoC) with embedded Programmable Logic (PL) that excels in many Edge (e.g., automotive or avionics) and Cloud (e.g., data centres) terrestrial applications. However, it incorporates a large amount of SRAM cells, making the device vulnerable to Neutron-induced Single Event Upsets (NSEUs) or otherwise soft errors. Semiconductor vendors incorporate soft error mitigation mechanisms to recover memory upsets (i.e., faults) before they propagate to the application output and become an error. But how effective are the MPSoC's mitigation schemes? Can they effectively recover upsets in high altitude or large scale applications under different workloads? This article answers the above research questions through a solid study that entails accelerated neutron radiation testing and dependability analysis. We test the device on a broad range of workloads, like multi-threaded software used for pose estimation and weather prediction or a software/hardware (SW/HW) co-design image classification application running on the AMD Deep Learning Processing Unit (DPU). Assuming a one-node MPSoC system in New York City (NYC) at 40k feet, all tested software applications achieve a Mean Time To Failure (MTTF) greater than 148 months, which shows that upsets are effectively recovered in the processing system of the MPSoC. However, the SW/HW co-design (i.e., DPU) in the same one-node system at 40k feet has an MTTF = 4 months due to the high failure rate of its PL accelerator, which emphasises that some MPSoC workloads may require additional NSEU mitigation schemes. Nevertheless, we show that the MTTF of the DPU can increase to 87 months without any overhead if one disregards the failure rate of tolerable errors since they do not affect the correctness of the classification output.Comment: This manuscript is under review at IEEE Transactions on Reliabilit

    Configuration Memory Scrubbing of SRAM-Based FPGAs Using a Mixed 2-D Coding Technique

    No full text
    SRAM-based field-programmable gate array (FPGA) vendors typically integrate error correction codes (ECCs) into the configuration memory to assist designers in implementing scrubbing mechanisms. In most cases, these ECC schemes guarantee the correction of single- and double-bit errors per configuration frame but fail to correct upsets with higher multiplicity in a single frame caused by a single event. This phenomenon has been observed in modern commercial-off-the-shelf FPGAs. Bit interleaving schemes are used in some FPGA families to scatter the multiple upsets into more than one frame, but this does not fully resolve the problem of uncorrectable errors. In this article, we propose a configuration memory scrubbing approach for SRAM-based FPGA devices, which combines the embedded ECC logic with an interframe, interleaved parity code to build a mixed 2-D coding technique. The proposed technique improves the multiple-bit error correction capabilities of the on-chip ECC scheme while keeping the error correction latency and hardware cost low. The scrubbing concept has been validated under heavy-ion irradiation, where it succeeded in correcting all the single and multiple upsets observed during the radiation experiment

    Single Event Effects Characterization of the Programmable Logic of Xilinx Zynq-7000 FPGA Using Very/Ultra High-Energy Heavy Ions

    No full text
    This article studies the impact of radiation-induced single-event effects (SEEs) in the Zynq-7000 field programmable gate array (FPGA) and presents an in-depth analysis of the SEE susceptibility of all the memories of the programmable logic. The radiation experiments were performed in the CERN North Area facility and in the GSI Helmholtz Centre for Heavy Ion Research using very/ultra high-energy heavy ions. The offline analysis of the radiation experimental results produced a deep understanding for various SEE phenomena observed in the Zynq-7000 FPGAs, such as single-event function interrupts (SEFIs), single-event transient (SET) in global signals, and multiple bit upsets that could be key issues for the design of an effective SEE mitigation approach
    corecore