101 research outputs found
Rate distortion functions of countably infinite alphabet memoryless sources
The Shannon lower bound approach to the evaluation of rate distortion functions R(D) for countably infinite alphabet memoryless sources is considered. Sufficient conditions based on the Contraction Mapping Theorem for the existence of the Shannon lower bound RL(D) to R(D) in a region of distortion [0, D1], D1 > 0 are obtained. Sufficient conditions based on the Schauder Fixed Point Theorem for the existence of a Dc > 0 such that R(D) = RL(D) for all D ε [0, Dc] are derived. Explicit evaluation of R(D) is considered for a class of column balanced distortion measures. Other results for distortion measures with no symmetry conditions are also discussed
A Generalized Typicality for Abstract Alphabets
A new notion of typicality for arbitrary probability measures on standard
Borel spaces is proposed, which encompasses the classical notions of weak and
strong typicality as special cases. Useful lemmas about strong typical sets,
including conditional typicality lemma, joint typicality lemma, and packing and
covering lemmas, which are fundamental tools for deriving many inner bounds of
various multi-terminal coding problems, are obtained in terms of the proposed
notion. This enables us to directly generalize lots of results on finite
alphabet problems to general problems involving abstract alphabets, without any
complicated additional arguments. For instance, quantization procedure is no
longer necessary to achieve such generalizations. Another fundamental lemma,
Markov lemma, is also obtained but its scope of application is quite limited
compared to others. Yet, an alternative theory of typical sets for Gaussian
measures, free from this limitation, is also developed. Some remarks on a
possibility to generalize the proposed notion for sources with memory are also
given.Comment: 44 pages; submitted to IEEE Transactions on Information Theor
A vector quantization approach to universal noiseless coding and quantization
A two-stage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may be noiseless codes, fixed-rate quantizers, or variable-rate quantizers. We take a vector quantization approach to two-stage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the first-stage quantizer, using induced measures of rate and distortion, to design locally optimal two-stage codes. On a source of medical images, two-stage variable-rate vector quantizers designed in this way outperform standard (one-stage) fixed-rate vector quantizers by over 9 dB. The tail of the operational distortion-rate function of the first-stage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of two-stage codes. We show that there exist two-stage universal noiseless codes, fixed-rate quantizers, and variable-rate quantizers whose per-letter rate and distortion redundancies converge to zero as (k/2)n -1 log n, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen's theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n-1) when the universe of sources is countable, and as O(n-1+ϵ) when the universe of sources is infinite-dimensional, under appropriate conditions
Estimation of the Rate-Distortion Function
Motivated by questions in lossy data compression and by theoretical
considerations, we examine the problem of estimating the rate-distortion
function of an unknown (not necessarily discrete-valued) source from empirical
data. Our focus is the behavior of the so-called "plug-in" estimator, which is
simply the rate-distortion function of the empirical distribution of the
observed data. Sufficient conditions are given for its consistency, and
examples are provided to demonstrate that in certain cases it fails to converge
to the true rate-distortion function. The analysis of its performance is
complicated by the fact that the rate-distortion function is not continuous in
the source distribution; the underlying mathematical problem is closely related
to the classical problem of establishing the consistency of maximum likelihood
estimators. General consistency results are given for the plug-in estimator
applied to a broad class of sources, including all stationary and ergodic ones.
A more general class of estimation problems is also considered, arising in the
context of lossy data compression when the allowed class of coding
distributions is restricted; analogous results are developed for the plug-in
estimator in that case. Finally, consistency theorems are formulated for
modified (e.g., penalized) versions of the plug-in, and for estimating the
optimal reproduction distribution.Comment: 18 pages, no figures [v2: removed an example with an error; corrected
typos; a shortened version will appear in IEEE Trans. Inform. Theory
Empirical processes, typical sequences and coordinated actions in standard Borel spaces
This paper proposes a new notion of typical sequences on a wide class of
abstract alphabets (so-called standard Borel spaces), which is based on
approximations of memoryless sources by empirical distributions uniformly over
a class of measurable "test functions." In the finite-alphabet case, we can
take all uniformly bounded functions and recover the usual notion of strong
typicality (or typicality under the total variation distance). For a general
alphabet, however, this function class turns out to be too large, and must be
restricted. With this in mind, we define typicality with respect to any
Glivenko-Cantelli function class (i.e., a function class that admits a Uniform
Law of Large Numbers) and demonstrate its power by giving simple derivations of
the fundamental limits on the achievable rates in several source coding
scenarios, in which the relevant operational criteria pertain to reproducing
empirical averages of a general-alphabet stationary memoryless source with
respect to a suitable function class.Comment: 14 pages, 3 pdf figures; accepted to IEEE Transactions on Information
Theor
Tight Bounds on the R\'enyi Entropy via Majorization with Applications to Guessing and Compression
This paper provides tight bounds on the R\'enyi entropy of a function of a
discrete random variable with a finite number of possible values, where the
considered function is not one-to-one. To that end, a tight lower bound on the
R\'enyi entropy of a discrete random variable with a finite support is derived
as a function of the size of the support, and the ratio of the maximal to
minimal probability masses. This work was inspired by the recently published
paper by Cicalese et al., which is focused on the Shannon entropy, and it
strengthens and generalizes the results of that paper to R\'enyi entropies of
arbitrary positive orders. In view of these generalized bounds and the works by
Arikan and Campbell, non-asymptotic bounds are derived for guessing moments and
lossless data compression of discrete memoryless sources.Comment: The paper was published in the Entropy journal (special issue on
Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding),
vol. 20, no. 12, paper no. 896, November 22, 2018. Online available at
https://www.mdpi.com/1099-4300/20/12/89
Joint source-channel coding with feedback
This paper quantifies the fundamental limits of variable-length transmission
of a general (possibly analog) source over a memoryless channel with noiseless
feedback, under a distortion constraint. We consider excess distortion, average
distortion and guaranteed distortion (-semifaithful codes). In contrast to
the asymptotic fundamental limit, a general conclusion is that allowing
variable-length codes and feedback leads to a sizable improvement in the
fundamental delay-distortion tradeoff. In addition, we investigate the minimum
energy required to reproduce source samples with a given fidelity after
transmission over a memoryless Gaussian channel, and we show that the required
minimum energy is reduced with feedback and an average (rather than maximal)
power constraint.Comment: To appear in IEEE Transactions on Information Theor
Mismatched Rate-Distortion Theory: Ensembles, Bounds, and General Alphabets
In this paper, we consider the mismatched rate-distortion problem, in which
the encoding is done using a codebook, and the encoder chooses the
minimum-distortion codeword according to a mismatched distortion function that
differs from the true one. For the case of discrete memoryless sources, we
establish achievable rate-distortion bounds using multi-user coding techniques,
namely, superposition coding and expurgated parallel coding. We give examples
where these attain the matched rate-distortion trade-off but a standard
ensemble with independent codewords fails to do so. On the other hand, in
contrast with the channel coding counterpart, we show that there are cases
where structured codebooks can perform worse than their unstructured
counterparts. In addition, in view of the difficulties in adapting the existing
and above-mentioned results to general alphabets, we consider a simpler
i.i.d.~random coding ensemble, and establish its achievable rate-distortion
bounds for general alphabets
- …