28 research outputs found
Guessing under source uncertainty
This paper considers the problem of guessing the realization of a finite
alphabet source when some side information is provided. The only knowledge the
guesser has about the source and the correlated side information is that the
joint source is one among a family. A notion of redundancy is first defined and
a new divergence quantity that measures this redundancy is identified. This
divergence quantity shares the Pythagorean property with the Kullback-Leibler
divergence. Good guessing strategies that minimize the supremum redundancy
(over the family) are then identified. The min-sup value measures the richness
of the uncertainty set. The min-sup redundancies for two examples - the
families of discrete memoryless sources and finite-state arbitrarily varying
sources - are then determined.Comment: 27 pages, submitted to IEEE Transactions on Information Theory, March
2006, revised September 2006, contains minor modifications and restructuring
based on reviewers' comment
Minimum Rates of Approximate Sufficient Statistics
Given a sufficient statistic for a parametric family of distributions, one
can estimate the parameter without access to the data. However, the memory or
code size for storing the sufficient statistic may nonetheless still be
prohibitive. Indeed, for independent samples drawn from a -nomial
distribution with degrees of freedom, the length of the code scales as
. In many applications, we may not have a useful notion of
sufficient statistics (e.g., when the parametric family is not an exponential
family) and we also may not need to reconstruct the generating distribution
exactly. By adopting a Shannon-theoretic approach in which we allow a small
error in estimating the generating distribution, we construct various {\em
approximate sufficient statistics} and show that the code length can be reduced
to . We consider errors measured according to the
relative entropy and variational distance criteria. For the code constructions,
we leverage Rissanen's minimum description length principle, which yields a
non-vanishing error measured according to the relative entropy. For the
converse parts, we use Clarke and Barron's formula for the relative entropy of
a parametrized distribution and the corresponding mixture distribution.
However, this method only yields a weak converse for the variational distance.
We develop new techniques to achieve vanishing errors and we also prove strong
converses. The latter means that even if the code is allowed to have a
non-vanishing error, its length must still be at least .Comment: To appear in the IEEE Transactions on Information Theor
Universal Noiseless Compression for Noisy Data
We study universal compression for discrete data sequences that were corrupted by noise. We show that while, as expected, there exist many cases in which the entropy of these sequences increases from that of the original data, somewhat surprisingly and counter-intuitively, universal coding redundancy of such sequences cannot increase compared to the original data. We derive conditions that guarantee that this redundancy does not decrease asymptotically (in first order) from the original sequence redundancy in the stationary memoryless case. We then provide bounds on the redundancy for coding finite length (large) noisy blocks generated by stationary memoryless sources and corrupted by some speci??c memoryless channels. Finally, we propose a sequential probability estimation method that can be used to compress binary data corrupted by some noisy channel. While there is much benefit in using this method in compressing short blocks of noise corrupted data, the new method is more general and allows sequential compression of binary sequences for which the probability of a bit is known to be limited within any given interval (not necessarily between 0 and 1). Additionally, this method has many different applications, including, prediction, sequential channel estimation, and others
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description
length approach is established. We sharpen and clarify the general modeling
principles MDL and MML, abstracted as the ideal MDL principle and defined from
Bayes's rule by means of Kolmogorov complexity. The basic condition under which
the ideal principle should be applied is encapsulated as the Fundamental
Inequality, which in broad terms states that the principle is valid when the
data are random, relative to every contemplated hypothesis and also these
hypotheses are random relative to the (universal) prior. Basically, the ideal
principle states that the prior probability associated with the hypothesis
should be given by the algorithmic universal probability, and the sum of the
log universal probability of the model plus the log of the probability of the
data given the model should be minimized. If we restrict the model class to the
finite sets then application of the ideal principle turns into Kolmogorov's
minimal sufficient statistic. In general we show that data compression is
almost always the best strategy, both in hypothesis identification and
prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor