125,345 research outputs found
Universal Coding and Prediction on Martin-L\"of Random Points
We perform an effectivization of classical results concerning universal
coding and prediction for stationary ergodic processes over an arbitrary finite
alphabet. That is, we lift the well-known almost sure statements to statements
about Martin-L\"of random sequences. Most of this work is quite mechanical but,
by the way, we complete a result of Ryabko from 2008 by showing that each
universal probability measure in the sense of universal coding induces a
universal predictor in the prequential sense. Surprisingly, the effectivization
of this implication holds true provided the universal measure does not ascribe
too low conditional probabilities to individual symbols. As an example, we show
that the Prediction by Partial Matching (PPM) measure satisfies this
requirement. In the almost sure setting, the requirement is superfluous.Comment: 12 page
Universal Codes from Switching Strategies
We discuss algorithms for combining sequential prediction strategies, a task
which can be viewed as a natural generalisation of the concept of universal
coding. We describe a graphical language based on Hidden Markov Models for
defining prediction strategies, and we provide both existing and new models as
examples. The models include efficient, parameterless models for switching
between the input strategies over time, including a model for the case where
switches tend to occur in clusters, and finally a new model for the scenario
where the prediction strategies have a known relationship, and where jumps are
typically between strongly related ones. This last model is relevant for coding
time series data where parameter drift is expected. As theoretical ontributions
we introduce an interpolation construction that is useful in the development
and analysis of new algorithms, and we establish a new sophisticated lemma for
analysing the individual sequence regret of parameterised models
Universal Densities Exist for Every Finite Reference Measure
As it is known, universal codes, which estimate the entropy rate
consistently, exist for stationary ergodic sources over finite alphabets but
not over countably infinite ones. We generalize universal coding as the problem
of universal densities with respect to a fixed reference measure on a countably
generated measurable space. We show that universal densities, which estimate
the differential entropy rate consistently, exist for finite reference
measures. Thus finite alphabets are not necessary in some sense. To exhibit a
universal density, we adapt the non-parametric differential (NPD) entropy rate
estimator by Feutrill and Roughan. Our modification is analogous to Ryabko's
modification of prediction by partial matching (PPM) by Cleary and Witten.
Whereas Ryabko considered a mixture over Markov orders, we consider a mixture
over quantization levels. Moreover, we demonstrate that any universal density
induces a strongly consistent Ces\`aro mean estimator of conditional density
given an infinite past. This yields a universal predictor with the loss
for a countable alphabet. Finally, we specialize universal densities to
processes over natural numbers and on the real line. We derive sufficient
conditions for consistent estimation of the entropy rate with respect to
infinite reference measures in these domains.Comment: 28 pages, no figure
Worst-case bounds for the logarithmic loss of predictors
We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmic) loss function. The excess loss (regret) is closely related to the redundancy of the associated lossless universal code. Using Shtarkov's theorem and tools from empirical process theory, we prove a general upper bound on the best possible (minimax) regret. The bound depends on certain metric properties of the class of predictors. We apply the bound to both parametric and nonparametric classes of predictors. Finally, we point out a suboptimal behavior of the popular Bayesian weighted average algorithm.Universal prediction, universal coding, empirical processes, on-line learning, metric entropy
Recommended from our members
Mobile Audiovisual Terminal: System Design and Subjective Testing in DECT and UMTS networks
It is anticipated that there will shortly be a requirement
for multimedia terminals that operate via mobile
communications systems. This paper presents a functional specification
for such a terminal operating at 32 kb/s in a digital
European cordless telecommunications (DECT) and universal
mobile telecommunications system (UMTS) radio network. A terminal
has been built, based on a PC with digital signal processor
(DSP) boards for audio and video coding and decoding. Speech
coding is by a phonetically driven code-excited linear prediction
(CELP) speech coder and video coding by a block-oriented hybrid
discrete cosine transform (DCT) coder. Separate channel coding
is provided for the audio and video data. The paper describes the
techniques used for audio and video coding, channel coding, and
synchronization. Methods of subjective testing in a DECT network
and in a UMTS network are also described. These consisted of
subjective tests of first impressions of the mobile audio–visual
terminal (MAVT) quality, interactive tests, and the completion
of an exit questionnaire. The test results showed that the quality
of the audio was sufficiently good for comprehension and the
video was sufficiently good for following and repeating simple
mechanical tasks. However, the quality of the MAVT was not
good enough for general use where high-quality audio and video
was needed, especially when transmission was in a noisy radio
environment
Universal lossless source coding with the Burrows Wheeler transform
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source
Scanning and Sequential Decision Making for Multi-Dimensional Data - Part I: the Noiseless Case
We investigate the problem of scanning and prediction ("scandiction", for
short) of multidimensional data arrays. This problem arises in several aspects
of image and video processing, such as predictive coding, for example, where an
image is compressed by coding the error sequence resulting from scandicting it.
Thus, it is natural to ask what is the optimal method to scan and predict a
given image, what is the resulting minimum prediction loss, and whether there
exist specific scandiction schemes which are universal in some sense.
Specifically, we investigate the following problems: First, modeling the data
array as a random field, we wish to examine whether there exists a scandiction
scheme which is independent of the field's distribution, yet asymptotically
achieves the same performance as if this distribution was known. This question
is answered in the affirmative for the set of all spatially stationary random
fields and under mild conditions on the loss function. We then discuss the
scenario where a non-optimal scanning order is used, yet accompanied by an
optimal predictor, and derive bounds on the excess loss compared to optimal
scanning and prediction.
This paper is the first part of a two-part paper on sequential decision
making for multi-dimensional data. It deals with clean, noiseless data arrays.
The second part deals with noisy data arrays, namely, with the case where the
decision maker observes only a noisy version of the data, yet it is judged with
respect to the original, clean data.Comment: 46 pages, 2 figures. Revised version: title changed, section 1
revised, section 3.1 added, a few minor/technical corrections mad
Universal Noiseless Compression for Noisy Data
We study universal compression for discrete data sequences that were corrupted by noise. We show that while, as expected, there exist many cases in which the entropy of these sequences increases from that of the original data, somewhat surprisingly and counter-intuitively, universal coding redundancy of such sequences cannot increase compared to the original data. We derive conditions that guarantee that this redundancy does not decrease asymptotically (in first order) from the original sequence redundancy in the stationary memoryless case. We then provide bounds on the redundancy for coding finite length (large) noisy blocks generated by stationary memoryless sources and corrupted by some speci??c memoryless channels. Finally, we propose a sequential probability estimation method that can be used to compress binary data corrupted by some noisy channel. While there is much benefit in using this method in compressing short blocks of noise corrupted data, the new method is more general and allows sequential compression of binary sequences for which the probability of a bit is known to be limited within any given interval (not necessarily between 0 and 1). Additionally, this method has many different applications, including, prediction, sequential channel estimation, and others
- …