Search CORE

31 research outputs found

Power Aware Scheduling of Tasks on FPGAs in Data Centers

Author: Danelutto Marco
Paul Rourab
Publication venue
Publication date: 18/11/2023
Field of study

A variety of computing platform like Field Programmable Gate Array (FPGA), Graphics Processing Unit (GPU) and multicore Central Processing Unit (CPU) in data centers are suitable for acceleration of data-intensive workloads. Especially, FPGA platforms in data centers are gaining popularity for high-performance computations due to their high speed, reconfigurable nature and cost effectiveness. Such heterogeneous, highly parallel computational architectures in data centers, combined with high-speed communication technologies like 5G, are becoming increasingly suitable for real-time applications. However, flexibility, cost-effectiveness, high computational capabilities, and energy efficiency remain challenging issues in FPGA based data centers. In this context an energy efficient scheduling solution is required to maximize the resource profitability of FPGA. This paper introduces a power-aware scheduling methodology aimed at accommodating periodic hardware tasks within the available FPGAs of a data center at their potentially maximum speed. This proposed methodology guarantees the execution of these tasks us ing the maximum number of parallel computation units possible to implement in the FPGAs, with minimum power consumption. The proposed scheduling methodology is implemented in a data center with multiple Alveo-50 Xilinx-AMD FPGAs and Vitis 2023 tool. The evidence from the implementation shows the proposed scheduling methodology is efficient compared to existing solutions

arXiv.org e-Print Archive

EXTRA: Towards an efficient open platform for reconfigurable High Performance Computing

Author: Becker Tobias
Brokalakis Andreas
Charitopoulos George
Ciobanu CǍtǍlin Bogdan
Gaydadjiev Georgi
Huebner Michael
Kadi Muhammed Al
Luk Wayne
Nikitakis Antonis
Niu Xinyu
Pnevmatikatos Dionisios
Santambrogio Marco D.
Sciuto Donatella
Stroobandt Dirk
Thom Alex J. W.
Vansteenkiste Elias
Varbanescu Ana Lucia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In the EXTRA project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on. EXTRA covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Ghent University Academic Bibliography

SwiftSpatial: Spatial Joins on Modern Hardware

Author: Alonso Gustavo
Jiang Wenqi
Parvanov Martin
Publication venue
Publication date: 28/09/2023
Field of study

Spatial joins are among the most time-consuming queries in spatial data management systems. In this paper, we propose SwiftSpatial, a specialized accelerator architecture tailored for spatial joins. SwiftSpatial contains multiple high-performance join units with innovative hybrid parallelism, several efficient memory management units, and an integrated on-chip join scheduler. We prototype SwiftSpatial on an FPGA and incorporate the R-tree synchronous traversal algorithm as the control flow. Benchmarked against various CPU and GPU-based spatial data processing systems, SwiftSpatial demonstrates a latency reduction of up to 5.36x relative to the best-performing baseline, while requiring 6.16x less power. The remarkable performance and energy efficiency of SwiftSpatial lay a solid foundation for its future integration into spatial data management systems, both in data centers and at the edge

arXiv.org e-Print Archive

Semantic Caching Framework: An FPGA-Based Application for IoT Security Monitoring

Author: Julien Lallet
Laurent d'Orazio
Publication venue: RonPub
Publication date: 01/01/2018
Field of study

Security monitoring is one subdomain of cybersecurity which aims to guarantee the safety of systems, continuously monitoring unusual events. The development of Internet Of Things leads to huge amounts of information, being heterogeneous and requiring to be efficiently managed. Cloud Computing provides software and hardware resources for large scale data management. However, performances for sequences of on-line queries on long term historical data may be not compatible with the emergency security monitoring. This work aims to address this problem by proposing a semantic caching framework and its application to acceleration hardware with FPGA for fast- and accurate-enough logs processing for various data stores and execution engines

RonPub -- Research Online Publishing

Directory of Open Access Journals

OmpSs@cloudFPGA: An FPGA task-based programming model with message passing

Author: Abel François
Ayguadé Parra Eduard
Cano Rubén
Haro Ruiz Juan Miguel de
Jiménez González Daniel
Labarta Mancho Jesús José
Martorell Bofill Xavier
Ringlein Burkhard
Weiss Beat
Álvarez Martínez Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs. Field-Programmable Gate Arrays (FPGA) are one of those types of accelerators that are becoming widely available in data centers. This paper proposes OmpSs@cloudFPGA, which includes novel extensions to parallel task-based programming models that enable easy and efficient programming of heterogeneous clusters with FPGAs. The programmer only needs to annotate, with OpenMP-like pragmas, the tasks of the application that should be accelerated in the cluster of FPGAs. Next, the proposed programming model framework automatically extracts parts annotated with High-Level Synthesis (HLS) pragmas and synthesizes them into hardware accelerator cores for FPGAs. Additionally, our extensions include and support two novel features: 1) FPGA-to-FPGA direct communication using a Message Passing Interface (MPI) similar Application Programming Interface (API) with one-to-one and collective communications to alleviate host communication channel bottleneck, and 2) creating and spawning work from inside the FPGAs to their own accelerator cores based on an MPI rank-like identification. These features break the classical host-accelerator model, where the host (typically the CPU) generates all the work and distributes it to each accelerator. We also present an evaluation of OmpSs@cloudFPGA for different parallel strategies of the N-Body application on the IBM cloudFPGA research platform. Results show that for cluster sizes up to 56 FPGAs, the performance scales linearly. To the best of our knowledge, this is the best performance obtained for N-body over FPGA platforms, reaching 344 Gpairs/s with 56 FPGAs. Finally, we compare the performance and power consumption of the proposed approach with the ones obtained by a classical execution on the MareNostrum 4 supercomputer, demonstrating that our FPGA approach reduces power consumption by an order of magnitude.This work has been done in the context of the IBM/BSC Deep Learning Center initiative. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA), from Spanish Government (PID2019-107255GBC21/AEI/10.13039/501100011033), and from Generalitat de Catalunya (2017-SGR-1414 and 2017-SGR-1328).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Single Event Effects Assessment of UltraScale+ MPSoC Systems under Atmospheric Radiation

Author: Agiakatsikas Dimitris
Cazzaniga Carlo
Foutris Nikos
Frost Chris
Goodacre John
Kastrioto Maria
Lujan Mikel
Psarakis Mihalis
Sari Aitzan
Souvatzoglou Ioanna
Vlagkoulis Vasileios
Ye Ruiqi
Publication venue
Publication date: 21/02/2023
Field of study

The AMD UltraScale+ XCZU9EG device is a Multi-Processor System-on-Chip (MPSoC) with embedded Programmable Logic (PL) that excels in many Edge (e.g., automotive or avionics) and Cloud (e.g., data centres) terrestrial applications. However, it incorporates a large amount of SRAM cells, making the device vulnerable to Neutron-induced Single Event Upsets (NSEUs) or otherwise soft errors. Semiconductor vendors incorporate soft error mitigation mechanisms to recover memory upsets (i.e., faults) before they propagate to the application output and become an error. But how effective are the MPSoC's mitigation schemes? Can they effectively recover upsets in high altitude or large scale applications under different workloads? This article answers the above research questions through a solid study that entails accelerated neutron radiation testing and dependability analysis. We test the device on a broad range of workloads, like multi-threaded software used for pose estimation and weather prediction or a software/hardware (SW/HW) co-design image classification application running on the AMD Deep Learning Processing Unit (DPU). Assuming a one-node MPSoC system in New York City (NYC) at 40k feet, all tested software applications achieve a Mean Time To Failure (MTTF) greater than 148 months, which shows that upsets are effectively recovered in the processing system of the MPSoC. However, the SW/HW co-design (i.e., DPU) in the same one-node system at 40k feet has an MTTF = 4 months due to the high failure rate of its PL accelerator, which emphasises that some MPSoC workloads may require additional NSEU mitigation schemes. Nevertheless, we show that the MTTF of the DPU can increase to 87 months without any overhead if one disregards the failure rate of tolerable errors since they do not affect the correctness of the classification output.Comment: This manuscript is under review at IEEE Transactions on Reliabilit

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository