3,219 research outputs found
Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks
Quantized Neural Networks (QNNs), which use low bitwidth numbers for
representing parameters and performing computations, have been proposed to
reduce the computation complexity, storage size and memory usage. In QNNs,
parameters and activations are uniformly quantized, such that the
multiplications and additions can be accelerated by bitwise operations.
However, distributions of parameters in Neural Networks are often imbalanced,
such that the uniform quantization determined from extremal values may under
utilize available bitwidth. In this paper, we propose a novel quantization
method that can ensure the balance of distributions of quantized values. Our
method first recursively partitions the parameters by percentiles into balanced
bins, and then applies uniform quantization. We also introduce computationally
cheaper approximations of percentiles to reduce the computation overhead
introduced. Overall, our method improves the prediction accuracies of QNNs
without introducing extra computation during inference, has negligible impact
on training speed, and is applicable to both Convolutional Neural Networks and
Recurrent Neural Networks. Experiments on standard datasets including ImageNet
and Penn Treebank confirm the effectiveness of our method. On ImageNet, the
top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7\%, which is
superior to the state-of-the-arts of QNNs
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Highly distributed training of Deep Neural Networks (DNNs) on future compute
platforms (offering 100 of TeraOps/s of computational capacity) is expected to
be severely communication constrained. To overcome this limitation, new
gradient compression techniques are needed that are computationally friendly,
applicable to a wide variety of layers seen in Deep Neural Networks and
adaptable to variations in network architectures as well as their
hyper-parameters. In this paper we introduce a novel technique - the Adaptive
Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized
selection of gradient residues and automatically tunes the compression rate
depending on local activity. We show excellent results on a wide spectrum of
state of the art Deep Learning models in multiple domains (vision, speech,
language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers
(SGD with momentum, Adam) and network parameters (number of learners,
minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate
end-to-end compression rates of ~200X for fully-connected and recurrent layers,
and ~40X for convolutional layers, without any noticeable degradation in model
accuracies.Comment: IBM Research AI, 9 pages, 7 figures, AAAI18 accepte
A quick search method for audio signals based on a piecewise linear representation of feature trajectories
This paper presents a new method for a quick similarity-based search through
long unlabeled audio streams to detect and locate audio clips provided by
users. The method involves feature-dimension reduction based on a piecewise
linear representation of a sequential feature trajectory extracted from a long
audio stream. Two techniques enable us to obtain a piecewise linear
representation: the dynamic segmentation of feature trajectories and the
segment-based Karhunen-L\'{o}eve (KL) transform. The proposed search method
guarantees the same search results as the search method without the proposed
feature-dimension reduction method in principle. Experiment results indicate
significant improvements in search speed. For example the proposed method
reduced the total search time to approximately 1/12 that of previous methods
and detected queries in approximately 0.3 seconds from a 200-hour audio
database.Comment: 20 pages, to appear in IEEE Transactions on Audio, Speech and
Language Processin
Graded quantization for multiple description coding of compressive measurements
Compressed sensing (CS) is an emerging paradigm for acquisition of compressed
representations of a sparse signal. Its low complexity is appealing for
resource-constrained scenarios like sensor networks. However, such scenarios
are often coupled with unreliable communication channels and providing robust
transmission of the acquired data to a receiver is an issue. Multiple
description coding (MDC) effectively combats channel losses for systems without
feedback, thus raising the interest in developing MDC methods explicitly
designed for the CS framework, and exploiting its properties. We propose a
method called Graded Quantization (CS-GQ) that leverages the democratic
property of compressive measurements to effectively implement MDC, and we
provide methods to optimize its performance. A novel decoding algorithm based
on the alternating directions method of multipliers is derived to reconstruct
signals from a limited number of received descriptions. Simulations are
performed to assess the performance of CS-GQ against other methods in presence
of packet losses. The proposed method is successful at providing robust coding
of CS measurements and outperforms other schemes for the considered test
metrics
- …