Search CORE

581 research outputs found

Results on the Redundancy of Universal Compression for Finite-Length Sequences

Author: Beirami Ahmad
Fekri Faramarz
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we investigate the redundancy of universal coding schemes on smooth parametric sources in the finite-length regime. We derive an upper bound on the probability of the event that a sequence of length

n

, chosen using Jeffreys' prior from the family of parametric sources with

d

unknown parameters, is compressed with a redundancy smaller than

(1-\epsilon)\frac{d}{2}\log n

for any

\epsilon>0

. Our results also confirm that for large enough

n

and

d

, the average minimax redundancy provides a good estimate for the redundancy of most sources. Our result may be used to evaluate the performance of universal source coding schemes on finite-length sequences. Additionally, we precisely characterize the minimax redundancy for two--stage codes. We demonstrate that the two--stage assumption incurs a negligible redundancy especially when the number of source parameters is large. Finally, we show that the redundancy is significant in the compression of small sequences.Comment: accepted in the 2011 IEEE International Symposium on Information Theory (ISIT 2011

arXiv.org e-Print Archive

CiteSeerX

Optimal prediction of Markov chains with and without spectral gap

Author: Han Yanjun
Jana Soham
Wu Yihong
Publication venue
Publication date: 26/06/2021
Field of study

We study the following learning problem with dependent data: Observing a trajectory of length

n

from a stationary Markov chain with

k

states, the goal is to predict the next state. For

3 \leq k \leq O(\sqrt{n})

, using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be

\Theta(\frac{k^2}{n}\log \frac{n}{k^2})

, in contrast to the optimal rate of

\Theta(\frac{\log \log n}{n})

for

k=2

previously shown in Falahatgar et al., 2016. These rates, slower than the parametric rate of

O(\frac{k^2}{n})

, can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is

O(\frac{k^2}{n})

, which coincides with that of an iid model with the same number of parameters.Comment: 52 page

arXiv.org e-Print Archive

Learning Non-Parametric and High-Dimensional Distributions via Information-Theoretic Methods

Author: Jana Soham
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2022
Field of study

Learning distributions that govern generation of data and estimation of related functionals are the foundations of many classical statistical problems. In the following dissertation we intend to investigate such topics when either the hypothesized model is non-parametric or the number of free parameters in the model grows along with the sample size. Especially, we study the above scenarios for the following class of problems with the goal of obtaining minimax rate-optimal methods for learning the target distributions when the sample size is finite. Our techniques are based on information-theoretic divergences and related mutual-information based methods. (i) Estimation in compound decision and empirical Bayes settings: To estimate the data-generating distribution, one often takes the following two-step approach. In the first step the statistician estimates the distribution of the parameters, either the empirical distribution or the postulated prior, and then in the second step plugs in the estimate to approximate the target of interest. In the literature, the estimation of empirical distribution is known as the compound decision problem and the estimation of prior is known as the problem of empirical Bayes. In our work we use the method of minimum-distance estimation for approximating these distributions. Considering certain discrete data setups, we show that the minimum-distance based method provides theoretically and practically sound choices for estimation. The computational and algorithmic aspects of the estimators are also analyzed. (ii) Prediction with Markov chains: Given observations from an unknown Markov chain, we study the problem of predicting the next entry in the trajectory. Existing analysis for such a dependent setup usually centers around concentration inequalities that uses various extraneous conditions on the mixing properties. This makes it difficult to achieve results independent of such restrictions. We introduce information-theoretic techniques to bypass such issues and obtain fundamental limits for the related minimax problems. We also analyze conditions on the mixing properties that produce a parametric rate of prediction errors

Yale University

R\'enyi Divergence and Kullback-Leibler Divergence

Author: Harremoës Peter
van Erven Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

R\'enyi divergence is related to R\'enyi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by R\'enyi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the R\'enyi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of R\'enyi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of

\sigma

-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.Comment: To appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Universal Coding on Infinite Alphabets: Exponentially Decreasing Envelopes

Author: Bontemps Dominique
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/05/2010
Field of study

This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent

\alpha

. The minimax redundancy of exponentially decreasing envelope classes is proved to be equivalent to

\frac{1}{4 \alpha \log e} \log^2 n

. Then a coding strategy is proposed, with a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive algorithm is provided, whose redundancy is equivalent to the minimax redundanc

arXiv.org e-Print Archive

CiteSeerX