Search CORE

1,767 research outputs found

Transformations of High-Level Synthesis Codes for High-Performance Computing

Author: Besta Maciej
Hoefler Torsten
Licht Johannes de Fine
Meierhans Simon
Publication venue
Publication date: 29/10/2019
Field of study

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

arXiv.org e-Print Archive

Repository for Publications and Research Data

CORBYS cognitive control architecture for robotic follower

Author: Badii Atta
Glackin Cornelius
Khan Ali
Leu Adrian
Polani D.
Raval Rajkumar
Ristic-Durrant Danijela
Salge Christoph
Slavnic Sinisa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/03/2014
Field of study

In this paper the novel generic cognitive robot control architecture CORBYS is presented. The objective of the CORBYS architecture is the integration of high-level cognitive modules to support robot functioning in dynamic environments including interacting with humans. This paper presents the preliminary integration of the CORBYS architecture to support a robotic follower. Experimental results on high-level empowerment-based trajectory planning have demonstrated the effectiveness of ROS-based communication between distributed modules developed in a multi-site research environment as typical for distributed collaborative projects such as CORBYS

Crossref

University of Hertfordshire Research Archive

Design, implementation and experimental validation of a 5G energy-aware reconfigurable hotspot

Author: Bartzoudis Nikolaos
Donato Morales Carlos Alberto
Font-Bach Oriol
Harbanau Pavel
López-Bueno David
Mangues-Bafalluy Josep
Miozzo Marco
Payaró Miquel
Requena-Esteso Manuel
Serrano Yáñez-Mingot Pablo
Publication venue: Elsevier
Publication date: 01/09/2018
Field of study

Flexibility and energy efficiency are considered two principal requirements of future fifth generation (5G) systems. From an architectural point of view, centralized processing and a dense deployment of small cells will play a vital role in enabling the efficient and dynamic operation of 5G networks. In this context, reconfigurable hotspots will provide on-demand services and adapt their operation in accordance to traffic re quirements, constituting a vital element of the heterogeneous 5G network infrastructure. In this paper we present a reconfigurable hotspot which is able to flexibly distribute its underlying communication functions across the network, as well as to adapt various parameters affecting the generation of the transmitted signal. The reconfiguration of the hotspot focuses on minimizing its energy footprint, while accounting for the current operative requirements. A real-time hotspot prototype has been developed to facilitate the realistic evaluation of the energy saving gains of the proposed scheme. The development flexibly combines software (SW) and hardware (HW) accelerated (HWA) functions in order to enable the agile reconfiguration of the hotspot. Actual power consumption measurements are presented for various relevant 5G networking scenarios and hotspot configurations. This thorough characterization of the energy footprint of the different subsystems of the prototype allows to map reconfiguration strategies to different use cases. Finally, the energy-aware design and implementation of the hotspot prototype is widely detailed in an effort to underline its importance to the provision of the flexibility and energy efficiency to future 5G systems.This work was supported by the European Commission in the framework of the H2020-ICT-2014-2 project Flex5Gware (Grant agreement no. 671563). The work of CTTC was also partially supported by the Generalitat de Catalunya (2017 SGR 891) and by the Spanish Government under project TEC2014-58341-C4-4-R

ZENODO

Universidad Carlos III de Madrid e-Archivo

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics

Author: Christian Pilato
Gerald Hempel
Jeronimo Castrillon
Karl F. A. Friebel
Mattia Tibaldi
Stephanie Soldavini
Publication venue
Publication date: 27/07/2022
Field of study

Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25x more energy efficient than expert-crafted Intel CPU implementations

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano