Search CORE

126 research outputs found

Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA Platforms

Author: Casu Mario R.
Cortadella Jordi
Lavagno Luciano
Lazarescu Mihai T.
Shan Junnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Multi-FPGA platforms like Amazon Web Services F1 are perfect to accelerate multi-kernel pipelined applications, like Convolutional Neural Networks (CNNs). To reduce energy consumption, we propose to upload at runtime the best power-optimized CNN implementation for a given throughput constraint. Our design method gives the best number of parallel instances of each kernel, their allocation to the FPGAs, the number of powered-on FPGAs and their clock frequency. This is obtained by solving a mixed-integer, non-linear optimization problem that models power and performance of each component, as well as the duration of the computation phases—data transfer between a host CPU and the FPGA memory (typically DDR), data transfer between DDR and FPGA, and FPGA computation. The results show that the power saved compared to simply clock gating the fastest implementation is obviously very high, but it is also much more significant than simply scaling the frequency of the fastest implementation or replicating the slowest implementation on multiple FPGAs

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Automatic generation of multi-precision multi-arithmetic CNN accelerators for FPGAs

Author: Cheung PYK
Constantinides G
Gao X
Guo X
Liu J
Mullins R
Wang E
Xu CZ
Zhao Y
Publication venue: Proceedings - 2019 International Conference on Field-Programmable Technology, ICFPT 2019
Publication date: 01/01/2019
Field of study

Modern deep Convolutional Neural Networks (CNNs) are computationally demanding, yet real applications often require high throughput and low latency. To help tackle these problems, we propose Tomato, a framework designed to automate the process of generating efficient CNN accelerators. The generated design is pipelined and each convolution layer uses different arithmetics at various precisions. Using Tomato, we showcase state-of-the-art multi-precision multi-arithmetic networks, including MobileNet-V1, running on FPGAs. To our knowledge, this is the first multi-precision multi-arithmetic auto-generation framework for CNNs. In software, Tomato fine-tunes pretrained networks to use a mixture of short powers-of-2 and fixed-point weights with a minimal loss in classification accuracy. The fine-tuned parameters are combined with the templated hardware designs to automatically produce efficient inference circuits in FPGAs. We demonstrate how our approach significantly reduces model sizes and computation complexities, and permits us to pack a complete ImageNet network onto a single FPGA without accessing off-chip memories for the first time. Furthermore, we show how Tomato produces implementations of networks with various sizes running on single or multiple FPGAs. To the best of our knowledge, our automatically generated accelerators outperform closest FPGA-based competitors by at least 2-4x for lantency and throughput; the generated accelerator runs ImageNet classification at a rate of more than 3000 frames per second.EPSRC Doctoral Scholarship Peterhouse Graduate Studentshi

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network

Author: B Sharat Chandra Varma
Chung Bob M.F.
Lee Kelvin C.M.
Ng Ho Cheung
Shum Ho Cheung
So Hayden Kwok-Hay
Tsia Kevin K
Wang Maolin
Wong Justin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Crossref

Ulster University's Research Portal