7 research outputs found
EuroEXA - D2.6: Final ported application software
This document describes the ported software of the EuroEXA applications to the single CRDB testbed and it discusses the experiences extracted from porting and optimization activities that should be actively taken into account in future redesign and optimization. This document accompanies the ported application software, found in the EuroEXA private repository (https://github.com/euroexa). In particular, this document describes the status of the software for each of the EuroEXA applications, sketches the redesign and optimization strategy for each application, discusses issues and difficulties faced during the porting activities and the relative lesson learned. A few preliminary evaluation results have been presented, however the full evaluation will be discussed in deliverable 2.8
Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation
The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip
(MPSoC) with embedded Programmable Logic (PL) that excels in many Edge (e.g.,
automotive or avionics) and Cloud (e.g., data centres) terrestrial
applications. However, it incorporates a large amount of SRAM cells, making the
device vulnerable to Neutron-induced Single Event Upsets (NSEUs) or otherwise
soft errors. Semiconductor vendors incorporate soft error mitigation mechanisms
to recover memory upsets (i.e., faults) before they propagate to the
application output and become an error. But how effective are the MPSoC's
mitigation schemes? Can they effectively recover upsets in high altitude or
large scale applications under different workloads? This article answers the
above research questions through a solid study that entails accelerated neutron
radiation testing and dependability analysis. We test the device on a broad
range of workloads, like multi-threaded software used for pose estimation and
weather prediction or a software/hardware (SW/HW) co-design image
classification application running on the AMD Deep Learning Processing Unit
(DPU). Assuming a one-node MPSoC system in New York City (NYC) at 40k feet, all
tested software applications achieve a Mean Time To Failure (MTTF) greater than
148 months, which shows that upsets are effectively recovered in the processing
system of the MPSoC. However, the SW/HW co-design (i.e., DPU) in the same
one-node system at 40k feet has an MTTF = 4 months due to the high failure rate
of its PL accelerator, which emphasises that some MPSoC workloads may require
additional NSEU mitigation schemes. Nevertheless, we show that the MTTF of the
DPU can increase to 87 months without any overhead if one disregards the
failure rate of tolerable errors since they do not affect the correctness of
the classification output.Comment: This manuscript is under review at IEEE Transactions on Reliabilit
Exploring the acceleration of the Met Office NERC Cloud model using FPGAs
The use of Field Programmable Gate Arrays (FPGAs) to accelerate computational
kernels has the potential to be of great benefit to scientific codes and the
HPC community in general. With the recent developments in FPGA programming
technology, the ability to port kernels is becoming far more accessible.
However, to gain reasonable performance from this technology it is not enough
to simple transfer a code onto the FPGA, instead the algorithm must be
rethought and recast in a data-flow style to suit the target architecture. In
this paper we describe the porting, via HLS, of one of the most computationally
intensive kernels of the Met Office NERC Cloud model (MONC), an atmospheric
model used by climate and weather researchers, onto an FPGA. We describe in
detail the steps taken to adapt the algorithm to make it suitable for the
architecture and the impact this has on kernel performance. Using a PCIe
mounted FPGA with on-board DRAM, we consider the integration on this kernel
within a larger infrastructure and explore the performance characteristics of
our approach in contrast to Intel CPUs that are popular in modern HPC machines,
over problem sizes involving very large grids. The result of this work is an
experience report detailing the challenges faced and lessons learnt in porting
this complex computational kernel to FPGAs, as well as exploring the role that
FPGAs can play and their fundamental limits in accelerating traditional HPC
workloads.Comment: Preprint of article in proceedings, ISC High Performance 2019.
Lecture Notes in Computer Science, vol 1188
First steps in porting the LFRic Weather and Climate model to the FPGAs of the EuroExa architecture
In recent years, there has been renewed interest in the use of field-programmable gate arrays (FPGAs) for high-performance computing (HPC). In this paper, we explore the techniques required by traditional HPC programmers in porting HPC applications to FPGAs, using as an example the LFRic weather and climate model. We report on the first steps in porting LFRic to the FPGAs of the EuroExa architecture. We have used Vivado High-Level Syntheusywwi to implement a matrix-vector kernel from the LFRic code on a Xilinx UltraScale+ development board containing an XCZU9EG multiprocessor system-on-chip. We describe the porting of the code, discuss the optimization decisions, and report performance of 5.34 Gflop/s with double precision and 5.58 Gflop/s with single precision. We discuss sources of inefficiencies, comparisons with peak performance, comparisons with CPU and GPU performance (taking into account power and price), comparisons with published techniques, and comparisons with published performance, and we conclude with some comments on the prospects for future progress with FPGA acceleration of the weather forecast model. The realization of practical exascale-class high-performance computinems requires significant improvements in the energy efficiency of such systems and their components. This has generated interest in computer architectures which utilize accelerators alongside traditional CPUs. FPGAs offer huge potential as an accelerator which can deliver performance for scientific applications at high levels of energy efficiency. The EuroExa project is developing and building a high-performance architecture based upon ARM CPUs with FPGA acceleration targeting exascale-class performance within a realistic power budget