Search CORE

337 research outputs found

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators

Author: Ahn Byung Hoon
Esmaeilzadeh Hadi
Ghodrati Soroush
Kinzer Sean
Li Xiaolong
Mahapatra Rohan
Mascarenhas Edwin
Matai Janarbek
Zhang Liang
Publication venue
Publication date: 27/10/2023
Field of study

Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure. Compilers for these accelerators face distinct challenges compared to those for general-purpose processors. These challenges include exposing and managing more micro-architectural features, handling software-managed scratch pads for on-chip storage, explicitly managing data movement, and matching DNN layers with varying hardware capabilities. These complexities necessitate a new approach to compiler design, as traditional compilers mainly focused on generating fine-grained instruction sequences while abstracting micro-architecture details. This paper introduces the Architecture Covenant Graph (ACG), an abstract representation of an architectural structure's components and their programmable capabilities. By enabling the compiler to work with the ACG, it allows for adaptable compilation workflows when making changes to accelerator design, reducing the need for a complete compiler redevelopment. Codelets, which express DNN operation functionality and evolve into execution mappings on the ACG, are key to this process. The Covenant compiler efficiently targets diverse deep learning accelerators, achieving 93.8% performance compared to state-of-the-art, hand-tuned DNN layer implementations when compiling 14 DNN layers from various models on two different architectures

arXiv.org e-Print Archive

fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs

Author: Bouganis C-S
Venieris SI
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2016
Field of study

Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a computationally heavy task, suffering from rapid complexity scaling. This paper presents fpgaConvNet, a novel domain-specific modelling framework together with an automated design methodology for the mapping of ConvNets onto reconfigurable FPGA-based platforms. By interpreting ConvNet classification as a streaming application, the proposed framework employs the Synchronous Dataflow (SDF) model of computation as its basis and proposes a set of transformations on the SDF graph that explore the performance-resource design space, while taking into account platform-specific resource constraints. A comparison with existing ConvNet FPGA works shows that the proposed fully-automated methodology yields hardware designs that improve the performance density by up to 1.62× and reach up to 90.75% of the raw performance of architectures that are hand-tuned for particular ConvNets

Crossref

Spiral - Imperial College Digital Repository

P2IP: A novel low-latency Programmable Pipeline Image Processor

Author: Dokladalova Eva
Harb Naim
Possa Paulo
Valderrama Carlos
Publication venue: 'Elsevier BV'
Publication date: 28/06/2015
Field of study

International audienceThis paper presents a novel systolic Coarse-Grained Reconfigurable Architecture for real-time image and video processing called P 2 IP. The P 2 IP is a scalable architecture that combines the low-latency characteristic of systolic array architectures with a runtime reconfigurable datapath. Reconfigurabil-ity of the P 2 IP enables it to perform a wide range of image pre-processing tasks directly on a pixel stream. The versatility of the P 2 IP is demonstrated through three image processing algorithms mapped onto the architecture, implemented in an FPGA-based platform. The obtained results show that the P 2 IP can achieve up to 129 fps in Full HD 1080p and 32 fps in 4K 2160p what makes it suitable for modern high-definition applications

Crossref

HAL Descartes

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Computer vision algorithms on reconfigurable logic arrays

Author: A.K. Jain
N.K. Ratha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators

Author: Bohm Agostini N.
Brooks D.
Castellana V. G.
Curzel S.
Ferrandi F.
Limaye A.
Manzano J.
Minutoli M.
Tumeo A.
Wei G.
Zhang J. J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano