Search CORE

38,542 research outputs found

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Author: Cong Jason
Wei Peng
Yu Cody Hao
Zhang Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/07/2018
Field of study

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

arXiv.org e-Print Archive

Crossref

Scipedia

3D high definition video coding on a GPU-based heterogeneous system

Author: Claver Jose M
De Cock Jan
Fernandez-Escribano Gerardo
Martinez Jose Luis
Pieters Bart
Rodriguez-Sanchez Rafael
Sanchez Jose L
Van de Walle Rik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm's ability to reduce the encoding time for different stereo high definition sequences. Speed-up values of up to 90× were obtained when compared with the reference encoder on the same platform. Moreover, the proposed algorithm also provides a more energy-efficient approach and hence requires less energy than the sequential reference algorith

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Repositori Institucional de la Universitat Jaume I