    Lossy compression of discrete sources via Viterbi algorithm

    We present a new lossy compressor for discrete-valued sources. For coding a sequence xnx^n, the encoder starts by assigning a certain cost to each possible reconstruction sequence. It then finds the one that minimizes this cost and describes it losslessly to the decoder via a universal lossless compressor. The cost of each sequence is a linear combination of its distance from the sequence xnx^n and a linear function of its kthk^{\rm th} order empirical distribution. The structure of the cost function allows the encoder to employ the Viterbi algorithm to recover the minimizer of the cost. We identify a choice of the coefficients comprising the linear function of the empirical distribution used in the cost function which ensures that the algorithm universally achieves the optimum rate-distortion performance of any stationary ergodic source in the limit of large nn, provided that kk diverges as o(logn)o(\log n). Iterative techniques for approximating the coefficients, which alleviate the computational burden of finding the optimal coefficients, are proposed and studied.Comment: 26 pages, 6 figures, Submitted to IEEE Transactions on Information Theor

    Universal Sampling Rate Distortion

    We examine the coordinated and universal rate-efficient sampling of a subset of correlated discrete memoryless sources followed by lossy compression of the sampled sources. The goal is to reconstruct a predesignated subset of sources within a specified level of distortion. The combined sampling mechanism and rate distortion code are universal in that they are devised to perform robustly without exact knowledge of the underlying joint probability distribution of the sources. In Bayesian as well as nonBayesian settings, single-letter characterizations are provided for the universal sampling rate distortion function for fixed-set sampling, independent random sampling and memoryless random sampling. It is illustrated how these sampling mechanisms are successively better. Our achievability proofs bring forth new schemes for joint source distribution-learning and lossy compression

    Compression-Based Compressed Sensing

    Modern compression algorithms exploit complex structures that are present in signals to describe them very efficiently. On the other hand, the field of compressed sensing is built upon the observation that "structured" signals can be recovered from their under-determined set of linear projections. Currently, there is a large gap between the complexity of the structures studied in the area of compressed sensing and those employed by the state-of-the-art compression codes. Recent results in the literature on deterministic signals aim at bridging this gap through devising compressed sensing decoders that employ compression codes. This paper focuses on structured stochastic processes and studies the application of rate-distortion codes to compressed sensing of such signals. The performance of the formerly-proposed compressible signal pursuit (CSP) algorithm is studied in this stochastic setting. It is proved that in the very low distortion regime, as the blocklength grows to infinity, the CSP algorithm reliably and robustly recovers nn instances of a stationary process from random linear projections as long as their count is slightly more than nn times the rate-distortion dimension (RDD) of the source. It is also shown that under some regularity conditions, the RDD of a stationary process is equal to its information dimension (ID). This connection establishes the optimality of the CSP algorithm at least for memoryless stationary sources, for which the fundamental limits are known. Finally, it is shown that the CSP algorithm combined by a family of universal variable-length fixed-distortion compression codes yields a family of universal compressed sensing recovery algorithms

    Universal Compressed Sensing

    In this paper, the problem of developing universal algorithms for compressed sensing of stochastic processes is studied. First, R\'enyi's notion of information dimension (ID) is generalized to analog stationary processes. This provides a measure of complexity for such processes and is connected to the number of measurements required for their accurate recovery. Then a minimum entropy pursuit (MEP) optimization approach is proposed, and it is proven that it can reliably recover any stationary process satisfying some mixing constraints from sufficient number of randomized linear measurements, without having any prior information about the distribution of the process. It is proved that a Lagrangian-type approximation of the MEP optimization problem, referred to as Lagrangian-MEP problem, is identical to a heuristic implementable algorithm proposed by Baron et al. It is shown that for the right choice of parameters the Lagrangian-MEP algorithm, in addition to having the same asymptotic performance as MEP optimization, is also robust to the measurement noise. For memoryless sources with a discrete-continuous mixture distribution, the fundamental limits of the minimum number of required measurements by a non-universal compressed sensing decoder is characterized by Wu et al. For such sources, it is proved that there is no loss in universal coding, and both the MEP and the Lagrangian-MEP asymptotically achieve the optimal performance

    Rate-Distortion via Markov Chain Monte Carlo

    We propose an approach to lossy source coding, utilizing ideas from Gibbs sampling, simulated annealing, and Markov Chain Monte Carlo (MCMC). The idea is to sample a reconstruction sequence from a Boltzmann distribution associated with an energy function that incorporates the distortion between the source and reconstruction, the compressibility of the reconstruction, and the point sought on the rate-distortion curve. To sample from this distribution, we use a `heat bath algorithm': Starting from an initial candidate reconstruction (say the original source sequence), at every iteration, an index i is chosen and the i-th sequence component is replaced by drawing from the conditional probability distribution for that component given all the rest. At the end of this process, the encoder conveys the reconstruction to the decoder using universal lossless compression. The complexity of each iteration is independent of the sequence length and only linearly dependent on a certain context parameter (which grows sub-logarithmically with the sequence length). We show that the proposed algorithms achieve optimum rate-distortion performance in the limits of large number of iterations, and sequence length, when employed on any stationary ergodic source. Experimentation shows promising initial results. Employing our lossy compressors on noisy data, with appropriately chosen distortion measure and level, followed by a simple de-randomization operation, results in a family of denoisers that compares favorably (both theoretically and in practice) with other MCMC-based schemes, and with the Discrete Universal Denoiser (DUDE).Comment: 35 pages, 16 figures, Submitted to IEEE Transactions on Information Theor

    Ergodic theory, entropy, and coding problems of information theory

    Sampling Rate Distortion

    Consider a memoryless multiple source with m components of which a (possibly randomized) subset of k ≤ m components are sampled at each time instant and jointly compressed with the objective of reconstructing a prespecified subset of the m components under a given distortion criterion. The combined sampling and lossy compression mechanisms are to be designed to perform robustly with or without exact knowledge of the underlying joint probability distribution of the source. In this dissertation, we introduce a new framework of sampling rate distortion to study the tradeoffs among sampling mechanism, encoder-decoder structure, compression rate and the desired level of accuracy in the reconstruction. We begin with a discrete memoryless multiple source whose joint probability mass function (pmf) is taken to be known. A notion of sampling rate distortion function is introduced to study the mentioned tradeoffs, and is characterized first for fixed-set sampling. Next, for independent random sampling performed without the knowledge of the source outputs, it is shown that the sampling rate distortion function is the same whether or not the decoder is informed of the sequence of sampled sets. For memoryless random sampling, with the sampling depending on the source outputs, it is shown that deterministic sampling, characterized by a conditional point-mass, is optimal and suffices to achieve the sampling rate distortion function. Building on this, we consider a universal setting where the joint pmf of a discrete memoryless multiple source is known only to belong to a {\it finite} family of pmfs. In Bayesian and nonBayesian settings, single-letter characterizations are provided for the universal sampling rate distortion function for the fixed-set sampling, independent random sampling and memoryless random sampling. We show that these sampling mechanisms successively improve upon each other: (i) in their ability to enable an associated encoder approximate the underlying joint pmf and (ii) in their ability to choose appropriate subsets of the multiple source for compression by the encoder. Lastly, we consider a jointly Gaussian multiple memoryless source, to be reconstructed under a mean-squared error distortion criterion, with joint probability distribution function known only to belong to an uncountable family of probability density functions (characterized by a convex compact subset in Euclidean space). For fixed-set sampling, we characterize the universal sampling rate distortion function in Bayesian and nonBayesian settings. We also provide optimal reconstruction algorithms, of reduced complexity, which compress and reconstruct the sampled source components first under a modified distortion criterion, and then form MMSE estimates for the unsampled components based on reconstructions of the former. The questions addressed in this dissertation are motivated by various applications, e.g., dynamic thermal management for multicore processors, in-network computation and satellite imaging