195 research outputs found
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Recent work in network quantization has substantially reduced the time and
space complexity of neural network inference, enabling their deployment on
embedded and mobile devices with limited computational and memory resources.
However, existing quantization methods often represent all weights and
activations with the same precision (bit-width). In this paper, we explore a
new dimension of the design space: quantizing different layers with different
bit-widths. We formulate this problem as a neural architecture search problem
and propose a novel differentiable neural architecture search (DNAS) framework
to efficiently explore its exponential search space with gradient-based
optimization. Experiments show we surpass the state-of-the-art compression of
ResNet on CIFAR-10 and ImageNet. Our quantized models with 21.1x smaller model
size or 103.9x lower computational cost can still outperform baseline quantized
or even full precision models
LBS: Loss-aware Bit Sharing for Automatic Model Compression
Low-bitwidth model compression is an effective method to reduce the model
size and computational overhead. Existing compression methods rely on some
compression configurations (such as pruning rates, and/or bitwidths), which are
often determined manually and not optimal. Some attempts have been made to
search them automatically, but the optimization process is often very
expensive. To alleviate this, we devise a simple yet effective method named
Loss-aware Bit Sharing (LBS) to automatically search for optimal model
compression configurations. To this end, we propose a novel single-path model
to encode all candidate compression configurations, where a high bitwidth
quantized value can be decomposed into the sum of the lowest bitwidth quantized
value and a series of re-assignment offsets. We then introduce learnable binary
gates to encode the choice of bitwidth, including filter-wise 0-bit for filter
pruning. By jointly training the binary gates in conjunction with network
parameters, the compression configurations of each layer can be automatically
determined. Extensive experiments on both CIFAR-100 and ImageNet show that LBS
is able to significantly reduce computational cost while preserving promising
performance.Comment: 22 page
Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence
Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems
Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators
While maximizing deep neural networks' (DNNs') acceleration efficiency
requires a joint search/design of three different yet highly coupled aspects,
including the networks, bitwidths, and accelerators, the challenges associated
with such a joint search have not yet been fully understood and addressed. The
key challenges include (1) the dilemma of whether to explode the memory
consumption due to the huge joint space or achieve sub-optimal designs, (2) the
discrete nature of the accelerator design space that is coupled yet different
from that of the networks and bitwidths, and (3) the chicken and egg problem
associated with network-accelerator co-search, i.e., co-search requires
operation-wise hardware cost, which is lacking during search as the optimal
accelerator depending on the whole network is still unknown during search. To
tackle these daunting challenges towards optimal and fast development of DNN
accelerators, we propose a framework dubbed Auto-NBA to enable jointly
searching for the Networks, Bitwidths, and Accelerators, by efficiently
localizing the optimal design within the huge joint design space for each
target dataset and acceleration specification. Our Auto-NBA integrates a
heterogeneous sampling strategy to achieve unbiased search with constant memory
consumption, and a novel joint-search pipeline equipped with a generic
differentiable accelerator search engine. Extensive experiments and ablation
studies validate that both Auto-NBA generated networks and accelerators
consistently outperform state-of-the-art designs (including
co-search/exploration techniques, hardware-aware NAS methods, and DNN
accelerators), in terms of search time, task accuracy, and accelerator
efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.Comment: Accepted at ICML 202
- …