Search CORE

2,250 research outputs found

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Author: Cong Jason
Wei Peng
Yu Cody Hao
Zhang Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/07/2018
Field of study

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

arXiv.org e-Print Archive

Crossref

Scipedia

From FPGA to ASIC: A RISC-V processor experience

Author: Rojas Morales Carlos
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

Author: Hempel Gerald
Publication venue
Publication date: 11/11/2019
Field of study

Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements

Technische Universität Dresden: Qucosa

Dimensionality reduction using parallel ICA and its implementation on FPGA in hyperspectral image analysis

Author: Du Hongtao
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2003
Field of study

Hyperspectral images, although providing abundant information of the object, also bring high computational burden to data processing. This thesis studies the challenging problem of dimensionality reduction in Hyperspectral Image (HSI) analysis. Currently, there are two methods to reduce the dimension: band selection and feature extraction. This thesis presents a band selection technique based on Independent Component Analysis (ICA), an unsupervised signal separation algorithm. Given only the observations of hyperspectral images, the ICA –based band selection picks the independent bands which contain most of the spectral information of the original images. Due to the high volume of hyperspectral images, ICA -based band selection is a time consuming process. This thesis develops a parallel ICA algorithm which divides the decorrelation process into internal decorrelation and external decorrelation such that computation burden can be distributed from single processor to multiple processors, and the ICA process can be run in a parallel mode. Hardware implementation is always a faster and real -time solution to HSI analysis. Until now, there are few hardware designs for ICA -related processes. This thesis synthesizes the parallel ICA -based band selection on Field Programmable Gate Array (FPGA), which is the best choice for moderate designs and fast implementations. Compared to other design syntheses, the synthesis present in this thesis develops three ICA re-configurable components for the purpose of reusability. In addition, this thesis demonstrates the relationship between the design and the capacity utilization of a single FPGA, then discusses the features of High Performance Reconfigurable Computing (HPRC) to accomodate large capacity and design requirements. Experiments are conducted on three data sets obtained from different sources. Experimental results show the effectiveness of the proposed ICA -based band selection, parallel ICA and its synthesis on FPGA

University of Tennessee, Knoxville: Trace

FPGA design methodology for industrial control systems—a review

Author: Cirstea Marcian N.
Monmasson Eric
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2007
Field of study

This paper reviews the state of the art of fieldprogrammable gate array (FPGA) design methodologies with a focus on industrial control system applications. This paper starts with an overview of FPGA technology development, followed by a presentation of design methodologies, development tools and relevant CAD environments, including the use of portable hardware description languages and system level programming/design tools. They enable a holistic functional approach with the major advantage of setting up a unique modeling and evaluation environment for complete industrial electronics systems. Three main design rules are then presented. These are algorithm refinement, modularity, and systematic search for the best compromise between the control performance and the architectural constraints. An overview of contributions and limits of FPGAs is also given, followed by a short survey of FPGA-based intelligent controllers for modern industrial systems. Finally, two complete and timely case studies are presented to illustrate the benefits of an FPGA implementation when using the proposed system modeling and design methodology. These consist of the direct torque control for induction motor drives and the control of a diesel-driven synchronous stand-alone generator with the help of fuzzy logic

Anglia Ruskin Research