23 research outputs found

    The AXIOM software layers

    Get PDF
    AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft

    Implementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial Dataset

    Get PDF
    This paper presents and analyzes a heterogeneous implementation of an industrial use case based on K-means that targets symmetric multiprocessing (SMP), GPUs and FPGAs. We present how the application can be optimized from an algorithmic point of view and how this optimization performs on two heterogeneous platforms. The presented implementation relies on the OmpSs programming model, which introduces a simplified pragma-based syntax for the communication between the main processor and the accelerators. Performance improvement can be achieved by the programmer explicitly specifying the data memory accesses or copies. As expected, the newer SMP+GPU system studied is more powerful than the older SMP+FPGA system. However the latter is enough to fulfill the requirements of our use case and we show that uses less energy when considering only the active power of the execution.This work is partially supported by the European Union H2020 project AXIOM (grant agreement n. 645496), HiPEAC (grant agreement n. 687698), and Mont-Blanc (grant agreements n. 288777, 610402 and 671697), the Spanish Government Programa Severo Ochoa (SEV-2015-0493), the Spanish Ministry of Science and Technology (TIN2015- 65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programaci´o i Entorns d’Execució Paral·lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Application Acceleration on FPGAs with OmpSs@FPGA

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.OmpSs@FPGA is the flavor of OmpSs that allows offloading application functionality to FPGAs. Similarly to OpenMP, it is based on compiler directives. While the OpenMP specification also includes support for heterogeneous execution, we use OmpSs and OmpSs@FPGA as prototype implementation to develop new ideas for OpenMP. OmpSs@FPGA implements the tasking model with runtime support to automatically exploit all SMP and FPGA resources available in the execution platform. In this paper, we present the OmpSs@FPGA ecosystem, based on the Mercurium compiler and the Nanos++ runtime system. We show how the applications are transformed to run on the SMP cores and the FPGA. The application kernels defined as tasks to be accelerated, using the OmpSs directives are: 1) transformed by the compiler into kernels connected with the proper synchronization and communication ports, 2) extracted to intermediate files, 3) compiled through the FPGA vendor HLS tool, and 4) used to configure the FPGA. Our Nanos++ runtime system schedules the application tasks on the platform, being able to use the SMP cores and the FPGA accelerators at the same time. We present the evaluation of the OmpSs@FPGA environment with the Matrix Multiplication, Cholesky and N-Body benchmarks, showing the internal details of the execution, and the performance obtained on a Zynq Ultrascale+ MPSoC (up to 128x). The source code uses OmpSs@FPGA annotations and different Vivado HLS optimization directives are applied for acceleration.This work is partially supported by the European Union H2020 program through the EuroEXA project (grant 754337), and HiPEAC (GA 687698), by the Spanish Government through Programa Severo Ochoa (SEV-2015- 0493), by the Spanish Ministry of Science and Technology (TIN2015-65316-P) and the Departament d’Innovació Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral·lels (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Synchronization / communication techniques for OmpSs@FPGA

    Get PDF
    HPC machines are introducing more and more heterogeneity in their architecture on the road to exascale systems. The increasing complexity of the machines due to the variety of hardware architectures and accelerators makes efficient programming a task harder than ever. Heterogeneous parallel programming models, such as OmpSs@FPGA, help the programmer handle the most unfriendly parts of working with accelerators. This master thesis analyzes the OmpSs@FPGA communication system and proposes a set of techniques to overcome the problems related to it and potentially improve the performance of the applications. The results show that the techniques proposed speed up the applications under certain conditions and, most importantly, solves some of the limitations that had the previous communication system. In particular, the new techniques specially improve the explotation of fine-grain parallelism and open the door to explore new possibilities with regard to data communication and re-use. Moreover, a tool (autoVivado) that automatically manages the process of bitstream generation, from the synthesis of the HLS code to the generation of the device-tree, has been developed as part of this master thesis. autoVivado has been fully integrated with the OmpSs@FPGA compiler infrastructure, providing the programmers a way to transparently generate parallel heterogenous programs and bitstreams from OmpSs applications that use FPGA accelerators

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    The AXIOM Software Layers

    Get PDF
    25siopenPeople and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed. The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).openAlvarez, C.; Ayguade, E.; Bosch, J.; Bueno, J.; Cherkashin, A.; Filgueras, A.; Jiminez-Gonzalez, D.; Martorell, X.; Navarro, N.; Vidal, M.; Theodoropoulos, D.; Pnevmatikatos, D.; Catani, D.; Oro, D.; Fernandez, C.; Segura, C.; Rodriguez, J.; Hernando, J.; Scordino, C.; Gai, P.; Passera, P.; Pomella, A.; Bettin, N.; Rizzo, A.; Giorgi, R.Alvarez, C.; Ayguade, E.; Bosch, J.; Bueno, J.; Cherkashin, A.; Filgueras, A.; Jiminez-Gonzalez, D.; Martorell, X.; Navarro, N.; Vidal, M.; Theodoropoulos, D.; Pnevmatikatos, D.; Catani, D.; Oro, D.; Fernandez, C.; Segura, C.; Rodriguez, J.; Hernando, J.; Scordino, C.; Gai, P.; Passera, P.; Pomella, A.; Bettin, N.; Rizzo, A.; Giorgi, R

    Software acceleration on Xilinx FPGAs using OmpSs@FPGA ecosystem

    Get PDF
    The OmpSs@FPGA programming model allows offloading application functionality to Xilinx Field Programmable Gate Arrays (FPGAs). The OmpSs compiler splits the code (written in C/C++ high level language) in two parts, targeting the host and the FPGA. The first is usually compiled by the GNU Compiler Collection (GCC), while the latter is given to the Xilinx Vivado HLS tool (hereafter HLS) for high level synthesis to VHDL and bitstream used to program the FPGA. OmpSs@FPGA is based on compiler directives, which allow the programmer to annotate the part of the code to automatically exploit all Symmetric MultiProcessor system (smp) and FPGA resource available in the execution platform. This technical report provides both descriptive and hands-on introductions to build application-specific FPGA systems using the high-level OmpSs@FPGA tool. The goal is to give the reader a baseline view of the process of creating an optimized hardware design annotating C-based code with HLS directives. We assume the reader has a working knowledge of C/C++, and familiarity with basic computer architecture concepts (e.g. speedup, parallelism, pipelining)

    An architectural journey into RISC architectures for HPC workloads

    Get PDF
    The thesis evaluates the current state-of-the-art of RISC architectures in HPC. Studying the performance, power, and energy to solution in heterogeneous SoCs. For the evaluation 2 arm platforms (CPU+GPU, CPU+FPGA), 1 RISC-V platform and 1 Open Source RISC-V core running in an FPGA have been tested

    Plataforma per la programació automàtica del sistema hardware reconfigurable d'una màquina Zynq

    Get PDF
    Las máquinas Zynq son una familia de SoC que integran una FPGA. La intención de este proyecto es automatizar todo el proceso que se requiere para obtener el bitstream a partir de una aplicación escrita en C/C++ con directivas OmpSs
    corecore