38 research outputs found

    Information-Distilling Quantizers

    Full text link
    Let XX and YY be dependent random variables. This paper considers the problem of designing a scalar quantizer for YY to maximize the mutual information between the quantizer's output and XX, and develops fundamental properties and bounds for this form of quantization, which is connected to the log-loss distortion criterion. The main focus is the regime of low I(X;Y)I(X;Y), where it is shown that, if XX is binary, a constant fraction of the mutual information can always be preserved using O(log(1/I(X;Y)))\mathcal{O}(\log(1/I(X;Y))) quantization levels, and there exist distributions for which this many quantization levels are necessary. Furthermore, for larger finite alphabets 2<X<2 < |\mathcal{X}| < \infty, it is established that an η\eta-fraction of the mutual information can be preserved using roughly (log(X/I(X;Y)))η(X1)(\log(| \mathcal{X} | /I(X;Y)))^{\eta\cdot(|\mathcal{X}| - 1)} quantization levels

    Switchable Precision Neural Networks

    Full text link
    Instantaneous and on demand accuracy-efficiency trade-off has been recently explored in the context of neural networks slimming. In this paper, we propose a flexible quantization strategy, termed Switchable Precision neural Networks (SP-Nets), to train a shared network capable of operating at multiple quantization levels. At runtime, the network can adjust its precision on the fly according to instant memory, latency, power consumption and accuracy demands. For example, by constraining the network weights to 1-bit with switchable precision activations, our shared network spans from BinaryConnect to Binarized Neural Network, allowing to perform dot-products using only summations or bit operations. In addition, a self-distillation scheme is proposed to increase the performance of the quantized switches. We tested our approach with three different quantizers and demonstrate the performance of SP-Nets against independently trained quantized models in classification accuracy for Tiny ImageNet and ImageNet datasets using ResNet-18 and MobileNet architectures

    Greedy-Merge Degrading has Optimal Power-Law

    Full text link
    Consider a channel with a given input distribution. Our aim is to degrade it to a channel with at most L output letters. One such degradation method is the so called "greedy-merge" algorithm. We derive an upper bound on the reduction in mutual information between input and output. For fixed input alphabet size and variable L, the upper bound is within a constant factor of an algorithm-independent lower bound. Thus, we establish that greedy-merge is optimal in the power-law sense.Comment: 5 pages, submitted to ISIT 201

    NICE: Noise Injection and Clamping Estimation for Neural Network Quantization

    Full text link
    Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, to name a few. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error. The \uniqname method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve the accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with low as 3-bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low power real-time applications. The implementation of the paper is available at https://github.com/Lancer555/NIC

    On the Uniqueness of Binary Quantizers for Maximizing Mutual Information

    Full text link
    We consider a channel with a binary input X being corrupted by a continuous-valued noise that results in a continuous-valued output Y. An optimal binary quantizer is used to quantize the continuous-valued output Y to the final binary output Z to maximize the mutual information I(X; Z). We show that when the ratio of the channel conditional density r(y) = P(Y=y|X=0)/ P(Y =y|X=1) is a strictly increasing/decreasing function of y, then a quantizer having a single threshold can maximize mutual information. Furthermore, we show that an optimal quantizer (possibly with multiple thresholds) is the one with the thresholding vector whose elements are all the solutions of r(y) = r* for some constant r* > 0. Interestingly, the optimal constant r* is unique. This uniqueness property allows for fast algorithmic implementation such as a bisection algorithm to find the optimal quantizer. Our results also confirm some previous results using alternative elementary proofs. We show some numerical examples of applying our results to channels with additive Gaussian noises

    From Quantized DNNs to Quantizable DNNs

    Full text link
    This paper proposes Quantizable DNNs, a special type of DNNs that can flexibly quantize its bit-width (denoted as `bit modes' thereafter) during execution without further re-training. To simultaneously optimize for all bit modes, a combinational loss of all bit modes is proposed, which enforces consistent predictions ranging from low-bit mode to 32-bit mode. This Consistency-based Loss may also be viewed as certain form of regularization during training. Because outputs of matrix multiplication in different bit modes have different distributions, we introduce Bit-Specific Batch Normalization so as to reduce conflicts among different bit modes. Experiments on CIFAR100 and ImageNet have shown that compared to quantized DNNs, Quantizable DNNs not only have much better flexibility, but also achieve even higher classification accuracy. Ablation studies further verify that the regularization through the consistency-based loss indeed improves the model's generalization performance

    Quantization-Guided Training for Compact TinyML Models

    Full text link
    We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-of-the-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.Comment: TinyML Summit, March 202

    Rate-Distortion Theory in Coding for Machines and its Application

    Full text link
    Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of images and video. As a result, a growing need for efficient compression methods optimized for machine vision, rather than human vision, has emerged. To meet this growing demand, several methods have been developed for image and video coding for machines. Unfortunately, while there is a substantial body of knowledge regarding rate-distortion theory for human vision, the same cannot be said of machine analysis. In this paper, we extend the current rate-distortion theory for machines, providing insight into important design considerations of machine-vision codecs. We then utilize this newfound understanding to improve several methods for learnable image coding for machines. Our proposed methods achieve state-of-the-art rate-distortion performance on several computer vision tasks such as classification, instance segmentation, and object detection

    Model-enhanced Vector Index

    Full text link
    Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to support document updates. In this paper, we aim to enhance the vector index with end-to-end deep generative models, leveraging the differentiable advantages of deep retrieval models while maintaining desirable serving efficiency. We propose Model-enhanced Vector Index (MEVI), a differentiable model-enhanced index empowered by a twin-tower representation model. MEVI leverages a Residual Quantization (RQ) codebook to bridge the sequence-to-sequence deep retrieval and embedding-based models. To substantially reduce the inference time, instead of decoding the unique document ids in long sequential steps, we first generate some semantic virtual cluster ids of candidate documents in a small number of steps, and then leverage the well-adapted embedding vectors to further perform a fine-grained search for the relevant documents in the candidate virtual clusters. We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions

    Binary Neural Networks: A Survey

    Full text link
    The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices. However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected
    corecore