Search CORE

305 research outputs found

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Chipmunk: A Systolically Scalable 0.9 mm ${}^2$ , 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

Author: Benini Luca
Cavigelli Lukas
Conti Francesco
Paulin Gianna
Susmelj Igor
Publication venue
Publication date: 01/01/2018
Field of study

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm

{}^2

) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Using Write Buffers in Systolic Array Architectures to Mitigate the Number of Memory Access Produced by Row Stationary Dataflows

Author: Peralta Velazquez Daniel U
Publication venue
Publication date: 24/07/2021
Field of study

New applications of Deep Neural Networks are being designed such as fraud detection, short term weather precipitation forecasts, and cancer prognosis prediction. Nonetheless, their respective models are getting more complex with an increasing number of depth layers. These models require millions of computations that conventional CPU and GPU architectures will take a significant amount of computational time. The data distribution of these models is well known; they mostly consist of dot product operations between inputs and filters. Applications such as self-driving cars required fast response time and accurate predictions. Current research introduces accelerator architectures based on 2D systolic arrays as they provide high efficiency in performing multiplication and accumulation operations. Computational and power cost define performance, memory accesses attribute the highest cost to current architecture models. In order to enhance the performance of DNN accelerators, parallelism is extracted by breaking convolution into partial computations at the expense of segmenting output memory accesses. This thesis explores the implementation of an accumulator microarchitecture component based on column pipelined adder trees with the purpose of collecting and aggregating output computed values based on destination address. The results of this work showed a 3.3x and 2.15x speedup for Tiny-YOLO and AlexNet CNN using a 32x64 Systolic Array. Through the reduction of computed values developers will be able to explore novel data mappings to extract parallelism based on data locality

Texas A&M Repository

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Author: Andri Renzo
Benini Luca
Cavigelli Lukas
Rossi Davide
Publication venue
Publication date: 01/01/2019
Field of study

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna