228 research outputs found
Bayesian Compression for Deep Learning
Compression and computational efficiency in deep learning have become a
problem of great significance. In this work, we argue that the most principled
and effective way to attack this problem is by adopting a Bayesian point of
view, where through sparsity inducing priors we prune large parts of the
network. We introduce two novelties in this paper: 1) we use hierarchical
priors to prune nodes instead of individual weights, and 2) we use the
posterior uncertainties to determine the optimal fixed point precision to
encode the weights. Both factors significantly contribute to achieving the
state of the art in terms of compression rates, while still staying competitive
with methods designed to optimize for speed or energy efficiency.Comment: Published as a conference paper at NIPS 201
FP8 Formats for Deep Learning
FP8 is a natural progression for accelerating deep learning training
inference beyond the 16-bit formats common in modern processors. In this paper
we propose an 8-bit floating point (FP8) binary interchange format consisting
of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit
exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for
representatio of special values, E4M3's dynamic range is extended by not
representing infinities and having only one mantissa bit-pattern for NaNs. We
demonstrate the efficacy of the FP8 format on a variety of image and language
tasks, effectively matching the result quality achieved by 16-bit training
sessions. Our study covers the main modern neural network architectures - CNNs,
RNNs, and Transformer-based models, leaving all the hyperparameters unchanged
from the 16-bit baseline training sessions. Our training experiments include
large, up to 175B parameter, language models. We also examine FP8
post-training-quantization of language models trained using 16-bit formats that
resisted fixed point int8 quantization
Train me if you can: decentralized learning on the deep edge
The end of Moore’s Law aligned with data privacy concerns is forcing machine learning (ML) to shift from the cloud to the deep edge. In the next-generation ML systems, the inference and part of the training process will perform at the edge, while the cloud stays responsible for major updates. This new computing paradigm, called federated learning (FL), alleviates the cloud and network infrastructure while increasing data privacy. Recent advances empowered the inference pass of quantized artificial neural networks (ANNs) on Arm Cortex-M and RISC-V microcontroller units (MCUs). Nevertheless, the training remains confined to the cloud, imposing the transaction of high volumes of private data over a network and leading to unpredictable delays when ML applications attempt to adapt to adversarial environments. To fill this gap, we make the first attempt to evaluate the feasibility of ANN training in Arm Cortex-M MCUs. From the available optimization algorithms, stochastic gradient descent (SGD) has the best trade-off between accuracy, memory footprint, and latency. However, its original form and the variants available in the literature still do not fit the stringent requirements of Arm Cortex-M MCUs. We propose L-SGD, a lightweight implementation of SGD optimized for maximum speed and minimal memory footprint in this class of MCUs. We developed a floating-point version and another that operates over quantized weights. For a fully-connected ANN trained on the MNIST dataset, L-SGD (float-32) is 4.20× faster than the SGD while requiring only 2.80% of the memory with negligible accuracy loss. Results also show that quantized training is still unfeasible to train an ANN from the scratch but is a lightweight solution to perform minor model fixes and counteract the fairness problem in typical FL systems.This work has been supported by FCT - Fundação para a Ciência e Tecnologia within the
R&D Units Project Scope: UIDB/00319/2020. This work has also been supported by FCT within the
PhD Scholarship Project Scope: SFRH/BD/146780/2019
- …