31 research outputs found
Power Aware Scheduling of Tasks on FPGAs in Data Centers
A variety of computing platform like Field Programmable Gate Array (FPGA),
Graphics Processing Unit (GPU) and multicore Central Processing Unit (CPU) in
data centers are suitable for acceleration of data-intensive workloads.
Especially, FPGA platforms in data centers are gaining popularity for
high-performance computations due to their high speed, reconfigurable nature
and cost effectiveness. Such heterogeneous, highly parallel computational
architectures in data centers, combined with high-speed communication
technologies like 5G, are becoming increasingly suitable for real-time
applications. However, flexibility, cost-effectiveness, high computational
capabilities, and energy efficiency remain challenging issues in FPGA based
data centers. In this context an energy efficient scheduling solution is
required to maximize the resource profitability of FPGA. This paper introduces
a power-aware scheduling methodology aimed at accommodating periodic hardware
tasks within the available FPGAs of a data center at their potentially maximum
speed. This proposed methodology guarantees the execution of these tasks us ing
the maximum number of parallel computation units possible to implement in the
FPGAs, with minimum power consumption. The proposed scheduling methodology is
implemented in a data center with multiple Alveo-50 Xilinx-AMD FPGAs and Vitis
2023 tool. The evidence from the implementation shows the proposed scheduling
methodology is efficient compared to existing solutions
EXTRA: Towards an efficient open platform for reconfigurable High Performance Computing
To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field
SwiftSpatial: Spatial Joins on Modern Hardware
Spatial joins are among the most time-consuming queries in spatial data
management systems. In this paper, we propose SwiftSpatial, a specialized
accelerator architecture tailored for spatial joins. SwiftSpatial contains
multiple high-performance join units with innovative hybrid parallelism,
several efficient memory management units, and an integrated on-chip join
scheduler. We prototype SwiftSpatial on an FPGA and incorporate the R-tree
synchronous traversal algorithm as the control flow. Benchmarked against
various CPU and GPU-based spatial data processing systems, SwiftSpatial
demonstrates a latency reduction of up to 5.36x relative to the best-performing
baseline, while requiring 6.16x less power. The remarkable performance and
energy efficiency of SwiftSpatial lay a solid foundation for its future
integration into spatial data management systems, both in data centers and at
the edge
Semantic Caching Framework: An FPGA-Based Application for IoT Security Monitoring
Security monitoring is one subdomain of cybersecurity which aims to guarantee the safety of systems, continuously monitoring unusual events. The development of Internet Of Things leads to huge amounts of information, being heterogeneous and requiring to be efficiently managed. Cloud Computing provides software and hardware resources for large scale data management. However, performances for sequences of on-line queries on long term historical data may be not compatible with the emergency security monitoring. This work aims to address this problem by proposing a semantic caching framework and its application to acceleration hardware with FPGA for fast- and accurate-enough logs processing for various data stores and execution engines
OmpSs@cloudFPGA: An FPGA task-based programming model with message passing
Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs.
Field-Programmable Gate Arrays (FPGA) are one of those types of accelerators that are becoming widely available in data centers.
This paper proposes OmpSs@cloudFPGA, which includes novel extensions to parallel task-based programming models that enable easy and efficient programming of heterogeneous clusters with FPGAs.
The programmer only needs to annotate, with OpenMP-like pragmas, the tasks of the application that should be accelerated in the cluster of FPGAs.
Next, the proposed programming model framework automatically extracts parts annotated with High-Level Synthesis (HLS) pragmas and synthesizes them into hardware accelerator cores for FPGAs.
Additionally, our extensions include and support two novel features: 1) FPGA-to-FPGA direct communication using a Message Passing Interface (MPI) similar Application Programming Interface (API) with one-to-one and collective communications to alleviate host communication channel bottleneck, and 2) creating and spawning work from inside the FPGAs to their own accelerator cores based on an MPI rank-like identification.
These features break the classical host-accelerator model, where the host (typically the CPU) generates all the work and distributes it to each accelerator.
We also present an evaluation of OmpSs@cloudFPGA for different parallel strategies of the N-Body application on the IBM cloudFPGA research platform.
Results show that for cluster sizes up to 56 FPGAs, the performance scales linearly.
To the best of our knowledge, this is the best performance obtained for N-body over FPGA platforms, reaching 344 Gpairs/s with 56 FPGAs.
Finally, we compare the performance and power consumption of the proposed approach with the ones obtained by a classical execution on the MareNostrum 4 supercomputer, demonstrating that our FPGA approach reduces power consumption by an order of magnitude.This work has been done in the context of the IBM/BSC Deep Learning Center initiative. This work has received funding from the European Unionâs Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA), from Spanish Government (PID2019-107255GBC21/AEI/10.13039/501100011033), and from Generalitat de Catalunya (2017-SGR-1414 and 2017-SGR-1328).Peer ReviewedPostprint (author's final draft
Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation
The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip
(MPSoC) with embedded Programmable Logic (PL) that excels in many Edge (e.g.,
automotive or avionics) and Cloud (e.g., data centres) terrestrial
applications. However, it incorporates a large amount of SRAM cells, making the
device vulnerable to Neutron-induced Single Event Upsets (NSEUs) or otherwise
soft errors. Semiconductor vendors incorporate soft error mitigation mechanisms
to recover memory upsets (i.e., faults) before they propagate to the
application output and become an error. But how effective are the MPSoC's
mitigation schemes? Can they effectively recover upsets in high altitude or
large scale applications under different workloads? This article answers the
above research questions through a solid study that entails accelerated neutron
radiation testing and dependability analysis. We test the device on a broad
range of workloads, like multi-threaded software used for pose estimation and
weather prediction or a software/hardware (SW/HW) co-design image
classification application running on the AMD Deep Learning Processing Unit
(DPU). Assuming a one-node MPSoC system in New York City (NYC) at 40k feet, all
tested software applications achieve a Mean Time To Failure (MTTF) greater than
148 months, which shows that upsets are effectively recovered in the processing
system of the MPSoC. However, the SW/HW co-design (i.e., DPU) in the same
one-node system at 40k feet has an MTTF = 4 months due to the high failure rate
of its PL accelerator, which emphasises that some MPSoC workloads may require
additional NSEU mitigation schemes. Nevertheless, we show that the MTTF of the
DPU can increase to 87 months without any overhead if one disregards the
failure rate of tolerable errors since they do not affect the correctness of
the classification output.Comment: This manuscript is under review at IEEE Transactions on Reliabilit