9 research outputs found

    TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale

    Get PDF
    To achieve high performance and high energy efficiency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetics; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research.This work is supported by the TEXTAROSSA project G.A. n.956831, as part of the EuroHPC initiative.Peer ReviewedArticle signat per 51 autors/es: Giovanni Agosta, Daniele Cattaneo, William Fornaciari, Andrea Galimberti, Giuseppe Massari, Federico Reghenzani, Federico Terraneo, Davide Zoni, Carlo Brandolese (DEIB – Politecnico di Milano, Italy, [email protected]) | Massimo Celino, Francesco Iannone, Paolo Palazzari, Giuseppe Zummo (ENEA, Italy, [email protected]) | Massimo Bernaschi, Pasqua D’Ambra (Istituto per le Applicazioni del Calcolo (IAC) - CNR, Italy, [email protected]) | Sergio Saponara, Marco Danelutto, Massimo Torquati (University of Pisa, Italy, [email protected]) | Marco Aldinucci, Yasir Arfat, Barbara Cantalupo, Iacopo Colonnelli, Roberto Esposito, Alberto R. Martinelli, Gianluca Mittone (University of Torino, Italy, [email protected]) | Olivier Beaumont, Berenger Bramas, Lionel Eyraud-Dubois, Brice Goglin, Abdou Guermouche, Raymond Namyst, Samuel Thibault (Inria - France, [email protected]) | Antonio Filgueras, Miquel Vidal, Carlos Alvarez, Xavier Martorell (BSC - Spain, [email protected]) | Ariel Oleksiak, Michal Kulczewski (PSNC, Poland, [email protected], [email protected]) | Alessandro Lonardo, Piero Vicini, Francesca Lo Cicero, Francesco Simula, Andrea Biagioni, Paolo Cretaro, Ottorino Frezza, Pier Stanislao Paolucci, Matteo Turisini (INFN Sezione di Roma - Italy, [email protected]) | Francesco Giacomini (INFN CNAF - Italy, [email protected]) | Tommaso Boccali (INFN Sezione di Pisa - Italy, [email protected]) | Simone Montangero (University of Padova and INFN Sezione di Padova - Italy, [email protected]) | Roberto Ammendola (INFN Sezione di Roma Tor Vergata - Italy, [email protected])Postprint (author's final draft

    TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale

    Get PDF
    International audienceTo achieve high performance and high energy efficiency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetics; methods andtools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research

    High-level synthesis for FPGAs: From prototyping to deployment

    Get PDF
    Abstract-Escalating System-on-Chip design complexity is pushing the design community to raise the level of abstraction beyond RTL. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for FPGA designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. Index Terms-Domain-specific design, field-programmable gate array (FPGA), high-level synthesis (HLS), quality of results (QoR)

    SdrLift: A Domain-Specific Intermediate Hardware Synthesis Framework for Prototyping Software-Defined Radios

    Get PDF
    Modern design of Software-Defined Radio (SDR) applications is based on Field Programmable Gate Arrays (FPGA) due to their ability to be configured into solution architectures that are well suited to domain-specific problems while achieving the best trade-off between performance, power, area, and flexibility. FPGAs are well known for rich computational resources, which traditionally include logic, register, and routing resources. The increased technological advances have seen FPGAs incorporating more complex components that comprise sophisticated memory blocks, Digital Signal Processing (DSP) blocks, and high-speed interfacing to Gigabit Ethernet (GbE) and Peripheral Component Interconnect Express (PCIe) bus. Gateware for programming FPGAs is described at a lowlevel of design abstraction using Register Transfer Language (RTL), typically using either VHSIC-HDL (VHDL) or Verilog code. In practice, the low-level description languages have a very steep learning curve, provide low productivity for hardware designers and lack readily available open-source library support for fundamental designs, and consequently limit the design to only hardware experts. These limitations have led to the adoption of High-Level Synthesis (HLS) tools that raise design abstraction using syntax, semantics, and software development notations that are well-known to most software developers. However, while HLS has made programming of FPGAs more accessible and can increase the productivity of design, they are still not widely adopted in the design community due to the low-level skills that are still required to produce efficient designs. Additionally, the resultant RTL code from HLS tools is often difficult to decipher, modify and optimize due to the functionality and micro-architecture that are coupled together in a single High-Level Language (HLL). In order to alleviate these problems, Domain-Specific Languages (DSL) have been introduced to capture algorithms at a high level of abstraction with more expressive power and providing domain-specific optimizations that factor in new transformations and the trade-off between resource utilization and system performance. The problem of existing DSLs is that they are designed around imperative languages with an instruction sequence that does not match the hardware structure and intrinsics, leading to hardware designs with system properties that are unconformable to the high-level specifications and constraints. The aim of this thesis is, therefore, to design and implement an intermediatelevel framework namely SdrLift for use in high-level rapid prototyping of SDR applications that are based on an FPGA. The SdrLift input is a HLL developed using functional language constructs and design patterns that specify the structural behavior of the application design. The functionality of the SdrLift language is two-fold, first, it can be used directly by a designer to develop the SDR applications, secondly, it can be used as the Intermediate Representation (IR) step that is generated by a higher-level language or a DSL. The SdrLift compiler uses the dataflow graph as an IR to structurally represent the accelerator micro-architecture in which the components correspond to the fine-level and coarse-level Hardware blocks (HW Block) which are either auto-synthesized or integrated from existing reusable Intellectual Property (IP) core libraries. Another IR is in the form of a dataflow model and it is used for composition and global interconnection of the HW Blocks while making efficient interfacing decisions in an attempt to satisfy speed and resource usage objectives. Moreover, the dataflow model provides rules and properties that will be used to provide a theoretical framework that formally analyzes the characteristics of SDR applications (i.e. the throughput, sample rate, latency, and buffer size among other factors). Using both the directed graph flow (DFG) and the dataflow model in the SdrLift compiler provides two benefits: an abstraction of the microarchitecture from the high-level algorithm specifications and also decoupling of the microarchitecture from the low-level RTL implementation. Following the IR creation and model analyses is the VHDL code generation which employs the low-level optimizations that ensure optimal hardware design results. The code generation process per forms analysis to ensure the resultant hardware system conforms to the high-level design specifications and constraints. SdrLift is evaluated by developing representative SDR case studies, in which the VHDL code for eight different SDR applications is generated. The experimental results show that SdrLift achieves the desired performance and flexibility, while also conserving the hardware resources utilized

    Design and Verification Environment for High-Performance Video-Based Embedded Systems

    Get PDF
    In this dissertation, a method and a tool to enable design and verification of computation demanding embedded vision-based systems is presented. Starting with an executable specification in OpenCV, we provide subsequent refinements and verification down to a system-on-chip prototype into an FPGA-Based smart camera. At each level of abstraction, properties of image processing applications are used along with structure composition to provide a generic architecture that can be automatically verified and mapped to the lower abstraction level. The result is a framework that encapsulates the computer vision library OpenCV at the highest level, integrates Accelera\u27s System-C/TLM with UVM and QEMU-OS for virtual prototyping and verification and mapping to a lower level, the last of which is the FPGA. This will relieve hardware designers from time-consuming and error-prone manual implementations, thus allowing them to focus on other steps of the design process. We also propose a novel streaming interface, called Component Interconnect and Data Access (CIDA), for embedded video designs, along with a formal model and a component composition mechanism to cluster components in logical and operational groups that reduce resource usage and power consumption

    A computing origami: Optimized code generation for emerging parallel platforms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Estimation and Optimization of the Performance of Polyhedral Process Networks

    Get PDF
    A system-level design methodology such as Daedalus provides designers with a forward synthesis flow for automated design, programming, and implementation of multiprocessor systems-on-chip. Daedalus employs the polyhedral process network model of computation to represent applications. These networks are automatically derived from sequential C code. A forward synthesis flow greatly increases designer productivity. Still, the designer needs to perform a time-consuming forward synthesis step to learn if a network satisfies his performance constraints. Furthermore, it is not trivial to select a set of transformations and transformation parameters for a network such that performance requirements are met. A forward synthesis flow thus solves only part of a design problem, as it does not provide fast feedback on the transformations a designer should apply to meet his performance constraints. This dissertation intro duces different performance estimation techniques for polyhedral process networks. The most promising technique is the profiling-based cprof technique that works directly on the sequential application code. This makes cprof an easy-to-use, robust, and fast technique, without the need to derive a polyhedral process network. This dissertation then discusses four transformations and analyzes factors that affect the efficacy of each transformation.Computer Systems, Imagery and Medi
    corecore