Search CORE

21 research outputs found

Emerging accelerator platforms for data centers

Author: Ozdal M. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/12/2017
Field of study

CPU and GPU platforms may not be the best options for many emerging compute patterns, which led to a new breed of emerging accelerator platforms. This article gives a comprehensive overview with a focus on commercial platforms

Bilkent University Institutional Repository

FOS: A Modular FPGA Operating System for Dynamic Workloads

Author: Koch Dirk
Pham Khoa
Powell Joseph
Vaishnav Anuj
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/01/2020
Field of study

With FPGAs now being deployed in the cloud and at the edge, there is a need for scalable design methods which can incorporate the heterogeneity present in the hardware and software components of FPGA systems. Moreover, these FPGA systems need to be maintainable and adaptable to changing workloads while improving accessibility for the application developers. However, current FPGA systems fail to achieve modularity and support for multi-tenancy due to dependencies between system components and lack of standardised abstraction layers. To solve this, we introduce a modular FPGA operating system -- FOS, which adopts a modular FPGA development flow to allow each system component to be changed and be agnostic to the heterogeneity of EDA tool versions, hardware and software layers. Further, to dynamically maximise the utilisation transparently from the users, FOS employs resource-elastic scheduling to arbitrate the FPGA resources in both time and spatial domain for any type of accelerators. Our evaluation on different FPGA boards shows that FOS can provide performance improvements in both single-tenant and multi-tenant environments while substantially reducing the development time and, at the same time, improving flexibility

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration

Author: Amarnath Aporva
Beaumont Jonathan
Blaauw David
Chakrabarti Chaitali
Cole Murray
Dreslinski Ronald
Feng Siying
He Xin
Kaszyk Kuba
Kim Hun-Seok
Kim Sung
May Kyle
Morton John Magnus
Mudge Trevor
O'Boyle Michael
Pal Subhankar
Park Dong-hyeon
Sun Jiawen
Xiong Yan
Yang Chi-Sheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/09/2020
Field of study

Crossref

Edinburgh Research Explorer

Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics

Author: Christian Pilato
Gerald Hempel
Jeronimo Castrillon
Karl F. A. Friebel
Mattia Tibaldi
Stephanie Soldavini
Publication venue
Publication date: 27/07/2022
Field of study

Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25x more energy efficient than expert-crafted Intel CPU implementations

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

D5.1: Accelerator Deployment Models

Author: Angelos Bilas
Christoforos Kachris
Eleni Kanellou
Nikolaos Chrysos
Publication venue
Publication date
Field of study

In this deliverable, we explore this question by studying accelerator deployment models. Under accelerator, we understand for example application-specific GPUs or specially programmed FPGAs. A deployment specifies types, amount, and connectivity of accelerators in a datacenter. With these definitions in mind, we created a theoretical model of the datacenter, its components, expected workloads, and finally, it is possible deployments. We have developed VineSim, a software simulator of a datacenter, based on the aforementioned theoretical modeling. VineSim takes as inputs a workload and a deployment description and outputs performance metrics of interest, such as job latency and resource utilization. In VineSim, one can configure several parameters, including how tasks are allocated to nodes, and estimations of how fast they execute on different accelerators. VineSim can be used to explore how different deployments respond to different kinds of workloads, thus allowing one to determine how to best compose a datacenter based on particular workload, performance, or budgeting requirements

ZENODO