Search CORE

1,341 research outputs found

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

Author: Benini Luca
Cavigelli Lukas
Publication venue
Publication date: 01/10/2018
Field of study

After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has triggered a wave of research towards specialized hardware accelerators. Their performance is often constrained by I/O bandwidth and the energy consumption is dominated by I/O transfers to off-chip memory. We introduce and evaluate a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks. We show that an average compression ratio of 4.4x relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

Author: Benini Luca
Cavigelli Lukas
Hager Pascal
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/11/2016
Field of study

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found their way into the areas of low-level computer vision and image processing to solve regression problems mostly with relatively shallow networks. We present a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function. We achieve a boost of up to 1.79 dB in PSNR over ordinary JPEG and an improvement of up to 0.36 dB over the best previous ConvNet result. We show that a network trained for a specific quality factor (QF) is resilient to the QF used to compress the input image - a single network trained for QF 60 provides a PSNR gain of more than 1.5 dB over the wide QF range from 40 to 76.Comment: 8 page

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Slotted ALOHA Overlay on LoRaWAN: a Distributed Synchronization Approach

Author: Benini Luca
Brunelli Davide
Polonelli Tommaso
Publication venue
Publication date: 06/09/2018
Field of study

LoRaWAN is one of the most promising standards for IoT applications. Nevertheless, the high density of end-devices expected for each gateway, the absence of an effective synchronization scheme between gateway and end-devices, challenge the scalability of these networks. In this article, we propose to regulate the communication of LoRaWAN networks using a Slotted-ALOHA (S-ALOHA) instead of the classic ALOHA approach used by LoRa. The implementation is an overlay on top of the standard LoRaWAN; thus no modification in pre-existing LoRaWAN firmware and libraries is necessary. Our method is based on a novel distributed synchronization service that is suitable for low-cost IoT end-nodes. S-ALOHA supported by our synchronization service significantly improves the performance of traditional LoRaWAN networks regarding packet loss rate and network throughput.Comment: 4 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Biosensors - Vision

Author: Benini Luca
Publication venue
Publication date: 18/12/2009
Field of study

Almae Matris Studiorum Campus

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

Author: Benini Luca
Conti Francesco
Schiavone Pasquale Davide
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully digital configurable hardware accelerator IP for BNNs, integrated within a microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid SRAM / standard cell memory. The XNE is able to fully compute convolutional and dense layers in autonomy or in cooperation with the core in the MCU to realize more complex behaviors. We show post-synthesis results in 65nm and 22nm technology for the XNE IP and post-layout results in 22nm for the full MCU indicating that this system can drop the energy cost per binary operation to 21.6fJ per operation at 0.4V, and at the same time is flexible and performant enough to execute state-of-the-art BNN topologies such as ResNet-34 in less than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

Author: Andri Renzo
Benini Luca
Cavigelli Lukas
Rossi Davide
Publication venue
Publication date: 24/02/2017
Field of study

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm

^2

and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna