Application Acceleration on FPGAs with OmpSs@FPGA

Ayguadé Parra, Eduard; Bosch, Jaume; Filgueras Izquierdo, Antonio; Jiménez-González, Daniel; Labarta Mancho, Jesús José; Martorell Bofill, Xavier; Mateu, Marc; Tan, Xubin; Vidal, Miquel; Álvarez, Carlos

Application Acceleration on FPGAs with OmpSs@FPGA

Authors: Eduard Ayguadé Parra
Jaume Bosch
Antonio Filgueras Izquierdo
Daniel Jiménez-González
Jesús José Labarta Mancho
Xavier Martorell Bofill
Marc Mateu
Xubin Tan
Miquel Vidal
Carlos Álvarez
Publication date
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'

Abstract

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous execution, we use OmpSs and OmpSs@FPGA as prototype implementation to develop new ideas for OpenMP. OmpSs@FPGA implements the tasking model with runtime support to automatically exploit all SMP and FPGA resources available in the execution platform. In this paper, we present the OmpSs@FPGA ecosystem, based on the Mercurium compiler and the Nanos++ runtime system. We show how the applications are transformed to run on the SMP cores and the FPGA. The application kernels defined as tasks to be accelerated, using the OmpSs directives are: 1) transformed by the compiler into kernels connected with the proper synchronization and communication ports, 2) extracted to intermediate files, 3) compiled through the FPGA vendor HLS tool, and 4) used to configure the FPGA. Our Nanos++ runtime system schedules the application tasks on the platform, being able to use the SMP cores and the FPGA accelerators at the same time. We present the evaluation of the OmpSs@FPGA environment with the Matrix Multiplication, Cholesky and N-Body benchmarks, showing the internal details of the execution, and the performance obtained on a Zynq Ultrascale+ MPSoC (up to 128x). The source code uses OmpSs@FPGA annotations and different Vivado HLS optimization directives are applied for acceleration.This work is partially supported by the European Union H2020 program through the EuroEXA project (grant 754337), and HiPEAC (GA 687698), by the Spanish Government through Programa Severo Ochoa (SEV-2015- 0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P) and the Departament d’Innovació Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral·lels (2014-SGR-1051).Peer Reviewe

Similar works

Full text

Available Versions

RECERCAT

oai:recercat.cat:2072/359897

Last time updated on 05/04/2020