Search CORE

28 research outputs found

Guessing under source uncertainty

Author: Sundaresan Rajesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper considers the problem of guessing the realization of a finite alphabet source when some side information is provided. The only knowledge the guesser has about the source and the correlated side information is that the joint source is one among a family. A notion of redundancy is first defined and a new divergence quantity that measures this redundancy is identified. This divergence quantity shares the Pythagorean property with the Kullback-Leibler divergence. Good guessing strategies that minimize the supremum redundancy (over the family) are then identified. The min-sup value measures the richness of the uncertainty set. The min-sup redundancies for two examples - the families of discrete memoryless sources and finite-state arbitrarily varying sources - are then determined.Comment: 27 pages, submitted to IEEE Transactions on Information Theory, March 2006, revised September 2006, contains minor modifications and restructuring based on reviewers' comment

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Access Repository of IISc Research Publications

Minimum Rates of Approximate Sufficient Statistics

Author: Hayashi Masahito
Tan Vincent Y. F.
Publication venue
Publication date: 16/11/2017
Field of study

Given a sufficient statistic for a parametric family of distributions, one can estimate the parameter without access to the data. However, the memory or code size for storing the sufficient statistic may nonetheless still be prohibitive. Indeed, for

n

independent samples drawn from a

k

-nomial distribution with

d=k-1

degrees of freedom, the length of the code scales as

d\log n+O(1)

. In many applications, we may not have a useful notion of sufficient statistics (e.g., when the parametric family is not an exponential family) and we also may not need to reconstruct the generating distribution exactly. By adopting a Shannon-theoretic approach in which we allow a small error in estimating the generating distribution, we construct various {\em approximate sufficient statistics} and show that the code length can be reduced to

\frac{d}{2}\log n+O(1)

. We consider errors measured according to the relative entropy and variational distance criteria. For the code constructions, we leverage Rissanen's minimum description length principle, which yields a non-vanishing error measured according to the relative entropy. For the converse parts, we use Clarke and Barron's formula for the relative entropy of a parametrized distribution and the corresponding mixture distribution. However, this method only yields a weak converse for the variational distance. We develop new techniques to achieve vanishing errors and we also prove strong converses. The latter means that even if the code is allowed to have a non-vanishing error, its length must still be at least

\frac{d}{2}\log n

.Comment: To appear in the IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Crossref

Universal Noiseless Compression for Noisy Data

Author: Shamir G.I.
Tjalkens T.J.
Willems F.M.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We study universal compression for discrete data sequences that were corrupted by noise. We show that while, as expected, there exist many cases in which the entropy of these sequences increases from that of the original data, somewhat surprisingly and counter-intuitively, universal coding redundancy of such sequences cannot increase compared to the original data. We derive conditions that guarantee that this redundancy does not decrease asymptotically (in first order) from the original sequence redundancy in the stationary memoryless case. We then provide bounds on the redundancy for coding finite length (large) noisy blocks generated by stationary memoryless sources and corrupted by some speci??c memoryless channels. Finally, we propose a sequential probability estimation method that can be used to compress binary data corrupted by some noisy channel. While there is much benefit in using this method in compressing short blocks of noise corrupted data, the new method is more general and allows sequential compression of binary sequences for which the probability of a bit is known to be limited within any given interval (not necessarily between 0 and 1). Additionally, this method has many different applications, including, prediction, sequential channel estimation, and others

Crossref

Pure OAI Repository

Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity

Author: Li Ming
Vitanyi Paul
Publication venue
Publication date: 01/01/1998
Field of study

The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general we show that data compression is almost always the best strategy, both in hypothesis identification and prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository