11 research outputs found

    Variable-length compression allowing errors

    Get PDF
    This paper studies the fundamental limits of the minimum average length of lossless and lossy variable-length compression, allowing a nonzero error probability ϵ\epsilon, for lossless compression. We give non-asymptotic bounds on the minimum average length in terms of Erokhin's rate-distortion function and we use those bounds to obtain a Gaussian approximation on the speed of approach to the limit which is quite accurate for all but small blocklengths: (1ϵ)kH(S)kV(S)2πe(Q1(ϵ))22(1 - \epsilon) k H(\mathsf S) - \sqrt{\frac{k V(\mathsf S)}{2 \pi} } e^{- \frac {(Q^{-1}(\epsilon))^2} 2 } where Q1()Q^{-1}(\cdot) is the functional inverse of the standard Gaussian complementary cdf, and V(S)V(\mathsf S) is the source dispersion. A nonzero error probability thus not only reduces the asymptotically achievable rate by a factor of 1ϵ1 - \epsilon, but this asymptotic limit is approached from below, i.e. larger source dispersions and shorter blocklengths are beneficial. Variable-length lossy compression under an excess distortion constraint is shown to exhibit similar properties

    Universal Source Coding in the Non-Asymptotic Regime

    Get PDF
    abstract: Fundamental limits of fixed-to-variable (F-V) and variable-to-fixed (V-F) length universal source coding at short blocklengths is characterized. For F-V length coding, the Type Size (TS) code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless sources over finite alphabets. The TS code assigns sequences ordered based on their type class sizes to binary strings ordered lexicographically. Universal F-V coding problem for the class of first-order stationary, irreducible and aperiodic Markov sources is first considered. Third-order coding rate of the TS code for the Markov class is derived. A converse on the third-order coding rate for the general class of F-V codes is presented which shows the optimality of the TS code for such Markov sources. This type class approach is then generalized for compression of the parametric sources. A natural scheme is to define two sequences to be in the same type class if and only if they are equiprobable under any model in the parametric class. This natural approach, however, is shown to be suboptimal. A variation of the Type Size code is introduced, where type classes are defined based on neighborhoods of minimal sufficient statistics. Asymptotics of the overflow rate of this variation is derived and a converse result establishes its optimality up to the third-order term. These results are derived for parametric families of i.i.d. sources as well as Markov sources. Finally, universal V-F length coding of the class of parametric sources is considered in the short blocklengths regime. The proposed dictionary which is used to parse the source output stream, consists of sequences in the boundaries of transition from low to high quantized type complexity, hence the name Type Complexity (TC) code. For large enough dictionary, the ϵ\epsilon-coding rate of the TC code is derived and a converse result is derived showing its optimality up to the third-order term.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

    Get PDF
    Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard's precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the redundancy rate\textit{redundancy rate} problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the average\textit{average} redundancy for known\textit{known} sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variable-to-variable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as trees\textit{trees}, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf)

    Network compression via network memory: fundamental performance limits

    Get PDF
    The amount of information that is churned out daily around the world is staggering, and hence, future technological advancements are contingent upon development of scalable acquisition, inference, and communication mechanisms for this massive data. This Ph.D. dissertation draws upon mathematical tools from information theory and statistics to understand the fundamental performance limits of universal compression of this massive data at the packet level using universal compression just above layer 3 of the network when the intermediate network nodes are enabled with the capability of memorizing the previous traffic. Universality of compression imposes an inevitable redundancy (overhead) to the compression performance of universal codes, which is due to the learning of the unknown source statistics. In this work, the previous asymptotic results about the redundancy of universal compression are generalized to consider the performance of universal compression at the finite-length regime (that is applicable to small network packets). Further, network compression via memory is proposed as a compression-based solution for the compression of relatively small network packets whenever the network nodes (i.e., the encoder and the decoder) are equipped with memory and have access to massive amounts of previous communication. In a nutshell, network compression via memory learns the patterns and statistics of the payloads of the packets and uses it for compression and reduction of the traffic. Network compression via memory, with the cost of increasing the computational overhead in the network nodes, significantly reduces the transmission cost in the network. This leads to huge performance improvement as the cost of transmitting one bit is by far greater than the cost of processing it.Ph.D

    One-to-One Code and Its Anti-Redundancy

    No full text
    Abstract — One-to-one codes are “one shot ” codes that assign a distinct codeword to source symbols and are not necessarily prefix codes (more generally, uniquely decodable). For example, such codes arise when there exists an “end of message ” channel symbol. Interestingly, as Wyner proved in 1972, for such codes the average code length can be smaller than the source entropy. By how much? We call this difference the anti-redundancy. Various authors over the years have shown that the anti-redundancy can be as big as minus the logarithm of the source entropy. However, to the best of our knowledge precise estimates do not exist. In this note, we consider a block code of length n generated by a binary memoryless source, and prove that the average anti-redundancy is − 1 2 log2 n + C + F(n) + o(1) where C is a constant and either F(n) = 0 if log 2 (1 − p)/p is irrational (where p is the probability of generating a “0”) or otherwise F(n) is a fluctuating function as the code length increases. This relatively simple finding requires a combination of quite sophisticated analytic tools such as precise evaluation of Bernoulli sums, the saddle point method, and theory of distribution of sequences modulo 1. I
    corecore