Search CORE

215 research outputs found

Results on the Redundancy of Universal Compression for Finite-Length Sequences

Author: Beirami Ahmad
Fekri Faramarz
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we investigate the redundancy of universal coding schemes on smooth parametric sources in the finite-length regime. We derive an upper bound on the probability of the event that a sequence of length

n

, chosen using Jeffreys' prior from the family of parametric sources with

d

unknown parameters, is compressed with a redundancy smaller than

(1-\epsilon)\frac{d}{2}\log n

for any

\epsilon>0

. Our results also confirm that for large enough

n

and

d

, the average minimax redundancy provides a good estimate for the redundancy of most sources. Our result may be used to evaluate the performance of universal source coding schemes on finite-length sequences. Additionally, we precisely characterize the minimax redundancy for two--stage codes. We demonstrate that the two--stage assumption incurs a negligible redundancy especially when the number of source parameters is large. Finally, we show that the redundancy is significant in the compression of small sequences.Comment: accepted in the 2011 IEEE International Symposium on Information Theory (ISIT 2011

arXiv.org e-Print Archive

CiteSeerX

A Parallel Two-Pass MDL Context Tree Algorithm for Universal Source Coding

Author: Baron Dror
Krishnan Nikhil
Mıhçak Mehmet Kıvanç
Publication venue
Publication date: 01/01/2014
Field of study

We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-

N

input sequence is partitioned into

B

blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of

B

, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) source underlying the entire input, and then encode each of the

B

blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is

O(N/B)

. Its redundancy is approximately

B\log(N/B)

bits above Rissanen's lower bound on universal coding performance, with respect to any tree source whose maximal depth is at most

\log(N/B)

arXiv.org e-Print Archive

CiteSeerX

Crossref

Strong Asymptotic Assertions for Discrete MDL in Regression and Classification

Author: Jan Pol
Jan Pol
Marcus Hutter
Marcus Hutter
Publication venue
Publication date: 01/01/2005
Field of study

We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a "true" model contained in the class. This implies almost sure convergence of the predictive distribution to the true one at a fast rate. It corresponds to Solomonoff's central theorem of universal induction, however with a bound that is exponentially larger.Comment: 6 two-column page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Estimation of AR and ARMA models by stochastic complexity

Author: Giurcăneanu Ciprian Doru
Rissanen Jorma
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

In this paper the stochastic complexity criterion is applied to estimation of the order in AR and ARMA models. The power of the criterion for short strings is illustrated by simulations. It requires an integral of the square root of Fisher information, which is done by Monte Carlo technique. The stochastic complexity, which is the negative logarithm of the Normalized Maximum Likelihood universal density function, is given. Also, exact asymptotic formulas for the Fisher information matrix are derived.Comment: Published at http://dx.doi.org/10.1214/074921706000000941 in the IMS Lecture Notes Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

Author: Jan Pol
Jan Pol
Marcus Hutter
Marcus Hutter
Publication venue
Publication date: 01/01/2004
Field of study

We consider the Minimum Description Length principle for online sequence prediction. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is bounded, implying convergence with probability one, and (b) it additionally specifies a `rate of convergence'. Generally, for MDL only exponential loss bounds hold, as opposed to the linear bounds for a Bayes mixture. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. The results apply to many Machine Learning tasks including classification and hypothesis testing. We provide arguments that our theorems generalize to countable classes of i.i.d. models.Comment: 17 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University