Search CORE

4 research outputs found

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

Author: Judd Patrick
Lascorz Alberto Delmas
Moshovos Andreas
Sharify Sayeh
Siu Kevin
Publication venue
Publication date: 16/05/2018
Field of study

Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved translates to proportional performance gains. Specifically, for convolutional layers LM's execution time scales inversely proportionally with the precisions of both weights and activations. For fully-connected layers LM's performance scales inversely proportionally with the precision of the weights. LM targets area- and bandwidth-constrained System-on-a-Chip designs such as those found on mobile devices that cannot afford the multi-megabyte buffers that would be needed to store each layer on-chip. Accordingly, given a data bandwidth budget, LM boosts energy efficiency and performance over an equivalent bit-parallel accelerator. For both weights and activations LM can exploit profile-derived perlayer precisions. However, at runtime LM further trims activation precisions at a much smaller than a layer granularity. Moreover, it can naturally exploit weight precision variability at a smaller granularity than a layer. On average, across several image classification CNNs and for a configuration that can perform the equivalent of 128 16b x 16b multiply-accumulate operations per cycle LM outperforms a state-of-the-art bit-parallel accelerator [1] by 4.38x without any loss in accuracy while being 3.54x more energy efficient. LM can trade-off accuracy for additional improvements in execution performance and energy efficiency and compares favorably to an accelerator that targeted only activation precisions. We also study 2- and 4-bit LM variants and find the the 2-bit per cycle variant is the most energy efficient

arXiv.org e-Print Archive

Crossref

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Author: Bannon Ciaran
Bengio Yoshua
Courbariaux Matthieu
Gripon Vincent
Hacene Ghouthi Boukli
Lascorz Alberto Delmas
Moshovos Andreas
Nikolić Miloš
Publication venue: HAL CCSD
Publication date: 21/02/2020
Field of study

Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Furthermore, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13 and 3.76 bits on AlexNet and ResNet18 respectively, remaining within 2.0% and 0.5% of the baseline TOP-1 accuracy

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

Accelerating Image-Sensor-Based Deep Learning Applications

Author: Alberto Delmas Lascorz
Andreas Moshovos
Dylan Malone Stuart
Isak Edo Vivancos
Jorge Albericio
Kevin Siu
Milos Nikolic
Mostafa Mahmoud
Patrick Judd
Sayeh Sharify
Zissis Poulos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref