20,999 research outputs found
GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks
Deep convolutional neural network (CNN) training via iterative optimization
has had incredible success in finding optimal parameters. However, modern CNN
architectures often contain millions of parameters. Thus, any given model for a
single architecture resides in a massive parameter space. Models with similar
loss could have drastically different characteristics such as adversarial
robustness, generalizability, and quantization robustness. For deep learning on
the edge, quantization robustness is often crucial. Finding a model that is
quantization-robust can sometimes require significant efforts. Recent works
using Graph Hypernetworks (GHN) have shown remarkable performance predicting
high-performant parameters of varying CNN architectures. Inspired by these
successes, we wonder if the graph representations of GHN-2 can be leveraged to
predict quantization-robust parameters as well, which we call GHN-Q. We conduct
the first-ever study exploring the use of graph hypernetworks for predicting
parameters of unseen quantized CNN architectures. We focus on a reduced CNN
search space and find that GHN-Q can in fact predict quantization-robust
parameters for various 8-bit quantized CNNs. Decent quantized accuracies are
observed even with 4-bit quantization despite GHN-Q not being trained on it.
Quantized finetuning of GHN-Q at lower bitwidths may bring further improvements
and is currently being explored.Comment: Updated Figure 1 and added additional results in Table 1. Initial
extended abstract version accepted at Edge Intelligence Workshop 2022 for
poster presentatio
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
The rising popularity of intelligent mobile devices and the daunting
computational cost of deep learning-based models call for efficient and
accurate on-device inference schemes. We propose a quantization scheme that
allows inference to be carried out using integer-only arithmetic, which can be
implemented more efficiently than floating point inference on commonly
available integer-only hardware. We also co-design a training procedure to
preserve end-to-end model accuracy post quantization. As a result, the proposed
quantization scheme improves the tradeoff between accuracy and on-device
latency. The improvements are significant even on MobileNets, a model family
known for run-time efficiency, and are demonstrated in ImageNet classification
and COCO detection on popular CPUs.Comment: 14 pages, 12 figure
- …