27 research outputs found
Approximation speed of quantized vs. unquantized ReLU neural networks and beyond
We deal with two complementary questions about approximation properties of
ReLU networks. First, we study how the uniform quantization of ReLU networks
with real-valued weights impacts their approximation properties. We establish
an upper-bound on the minimal number of bits per coordinate needed for
uniformly quantized ReLU networks to keep the same polynomial asymptotic
approximation speeds as unquantized ones. We also characterize the error of
nearest-neighbour uniform quantization of ReLU networks. This is achieved using
a new lower-bound on the Lipschitz constant of the map that associates the
parameters of ReLU networks to their realization, and an upper-bound
generalizing classical results. Second, we investigate when ReLU networks can
be expected, or not, to have better approximation properties than other
classical approximation families. Indeed, several approximation families share
the following common limitation: their polynomial asymptotic approximation
speed of any set is bounded from above by the encoding speed of this set. We
introduce a new abstract property of approximation families, called
infinite-encodability, which implies this upper-bound. Many classical
approximation families, defined with dictionaries or ReLU networks, are shown
to be infinite-encodable. This unifies and generalizes several situations where
this upper-bound is known
Quantized Deep Transfer Learning - Gearbox Fault Diagnosis on Edge Devices
This study has designed and implemented a deep transfer learning (DTL) model-based framework that takes an input time series of gearbox vibration patterns, which are accelerometer readings. It classifies the gear’s damage type from a predefined catalog. Industrial gearboxes are often operated even after damage because damage detection is formidable. It causes a lot of wear and tear, which leads to more repair costs. With this proposed DTL model-based framework, at an early stage, gearbox damage can be detected so that gears can be replaced immediately with less repair cost. The proposed methodology involves training a convolutional neural network (CNN) model using a transfer learning technique on a predefined dataset of eight types of gearbox conditions. Then, using quantization, the size of the CNN model is reduced, leading to easy inference on edge and embedded devices. An accuracy of 99.49 % using transfer learning of the VGG16 model is achieved, pre-trained on the Imagenet dataset. Other models and architectures were also tested, but VGG16 emerged as the winner. The methodology also addresses the problem of deployment on edge/embedded devices, as in most cases, accurate models are too heavy to be used in the industry due to memory and computation power constraints in embedded devices. This is done with the help of quantization, enabling the proposed model to be deployed on devices like the Raspberry Pi, leading to inference on the go without the need for the internet and cloud computing. Consequently, the current methodology achieved a 4x reduction in model size with the help of INT8 Quantization
ProxQuant: Quantized Neural Networks via Proximal Operators
To make deep neural networks feasible in resource-constrained environments
(such as mobile devices), it is beneficial to quantize models by using
low-precision weights. One common technique for quantizing neural networks is
the straight-through gradient method, which enables back-propagation through
the quantization mapping. Despite its empirical success, little is understood
about why the straight-through gradient method works.
Building upon a novel observation that the straight-through gradient method
is in fact identical to the well-known Nesterov's dual-averaging algorithm on a
quantization constrained optimization problem, we propose a more principled
alternative approach, called ProxQuant, that formulates quantized network
training as a regularized learning problem instead and optimizes it via the
prox-gradient method. ProxQuant does back-propagation on the underlying
full-precision vector and applies an efficient prox-operator in between
stochastic gradient steps to encourage quantizedness. For quantizing ResNets
and LSTMs, ProxQuant outperforms state-of-the-art results on binary
quantization and is on par with state-of-the-art on multi-bit quantization. For
binary quantization, our analysis shows both theoretically and experimentally
that ProxQuant is more stable than the straight-through gradient method (i.e.
BinaryConnect), challenging the indispensability of the straight-through
gradient method and providing a powerful alternative