3,562 research outputs found

    Pairwise Quantization

    Get PDF
    We consider the task of lossy compression of high-dimensional vectors through quantization. We propose the approach that learns quantization parameters by minimizing the distortion of scalar products and squared distances between pairs of points. This is in contrast to previous works that obtain these parameters through the minimization of the reconstruction error of individual points. The proposed approach proceeds by finding a linear transformation of the data that effectively reduces the minimization of the pairwise distortions to the minimization of individual reconstruction errors. After such transformation, any of the previously-proposed quantization approaches can be used. Despite the simplicity of this transformation, the experiments demonstrate that it achieves considerable reduction of the pairwise distortions compared to applying quantization directly to the untransformed data

    Bolt: Accelerated Data Mining with Fast Vector Compression

    Full text link
    Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.Comment: Research track paper at KDD 201

    Filterbank optimization with convex objectives and the optimality of principal component forms

    Get PDF
    This paper proposes a general framework for the optimization of orthonormal filterbanks (FBs) for given input statistics. This includes as special cases, many previous results on FB optimization for compression. It also solves problems that have not been considered thus far. FB optimization for coding gain maximization (for compression applications) has been well studied before. The optimum FB has been known to satisfy the principal component property, i.e., it minimizes the mean-square error caused by reconstruction after dropping the P weakest (lowest variance) subbands for any P. We point out a much stronger connection between this property and the optimality of the FB. The main result is that a principal component FB (PCFB) is optimum whenever the minimization objective is a concave function of the subband variances produced by the FB. This result has its grounding in majorization and convex function theory and, in particular, explains the optimality of PCFBs for compression. We use the result to show various other optimality properties of PCFBs, especially for noise-suppression applications. Suppose the FB input is a signal corrupted by additive white noise, the desired output is the pure signal, and the subbands of the FB are processed to minimize the output noise. If each subband processor is a zeroth-order Wiener filter for its input, we can show that the expected mean square value of the output noise is a concave function of the subband signal variances. Hence, a PCFB is optimum in the sense of minimizing this mean square error. The above-mentioned concavity of the error and, hence, PCFB optimality, continues to hold even with certain other subband processors such as subband hard thresholds and constant multipliers, although these are not of serious practical interest. We prove that certain extensions of this PCFB optimality result to cases where the input noise is colored, and the FB optimization is over a larger class that includes biorthogonal FBs. We also show that PCFBs do not exist for the classes of DFT and cosine-modulated FBs

    A statistical reduced-reference method for color image quality assessment

    Full text link
    Although color is a fundamental feature of human visual perception, it has been largely unexplored in the reduced-reference (RR) image quality assessment (IQA) schemes. In this paper, we propose a natural scene statistic (NSS) method, which efficiently uses this information. It is based on the statistical deviation between the steerable pyramid coefficients of the reference color image and the degraded one. We propose and analyze the multivariate generalized Gaussian distribution (MGGD) to model the underlying statistics. In order to quantify the degradation, we develop and evaluate two measures based respectively on the Geodesic distance between two MGGDs and on the closed-form of the Kullback Leibler divergence. We performed an extensive evaluation of both metrics in various color spaces (RGB, HSV, CIELAB and YCrCb) using the TID 2008 benchmark and the FRTV Phase I validation process. Experimental results demonstrate the effectiveness of the proposed framework to achieve a good consistency with human visual perception. Furthermore, the best configuration is obtained with CIELAB color space associated to KLD deviation measure

    Integer-Forcing Source Coding

    Full text link
    Integer-Forcing (IF) is a new framework, based on compute-and-forward, for decoding multiple integer linear combinations from the output of a Gaussian multiple-input multiple-output channel. This work applies the IF approach to arrive at a new low-complexity scheme, IF source coding, for distributed lossy compression of correlated Gaussian sources under a minimum mean squared error distortion measure. All encoders use the same nested lattice codebook. Each encoder quantizes its observation using the fine lattice as a quantizer and reduces the result modulo the coarse lattice, which plays the role of binning. Rather than directly recovering the individual quantized signals, the decoder first recovers a full-rank set of judiciously chosen integer linear combinations of the quantized signals, and then inverts it. In general, the linear combinations have smaller average powers than the original signals. This allows to increase the density of the coarse lattice, which in turn translates to smaller compression rates. We also propose and analyze a one-shot version of IF source coding, that is simple enough to potentially lead to a new design principle for analog-to-digital converters that can exploit spatial correlations between the sampled signals.Comment: Submitted to IEEE Transactions on Information Theor
    • …
    corecore