312 research outputs found
MDL Convergence Speed for Bernoulli Sequences
The Minimum Description Length principle for online sequence
estimation/prediction in a proper learning setup is studied. If the underlying
model class is discrete, then the total expected square loss is a particularly
interesting performance measure: (a) this quantity is finitely bounded,
implying convergence with probability one, and (b) it additionally specifies
the convergence speed. For MDL, in general one can only have loss bounds which
are finite but exponentially larger than those for Bayes mixtures. We show that
this is even the case if the model class contains only Bernoulli distributions.
We derive a new upper bound on the prediction error for countable Bernoulli
classes. This implies a small bound (comparable to the one for Bayes mixtures)
for certain important model classes. We discuss the application to Machine
Learning tasks such as classification and hypothesis testing, and
generalization to countable classes of i.i.d. models.Comment: 28 page
On the Convergence Speed of MDL Predictions for Bernoulli Sequences
We consider the Minimum Description Length principle for online sequence
prediction. If the underlying model class is discrete, then the total expected
square loss is a particularly interesting performance measure: (a) this
quantity is bounded, implying convergence with probability one, and (b) it
additionally specifies a `rate of convergence'. Generally, for MDL only
exponential loss bounds hold, as opposed to the linear bounds for a Bayes
mixture. We show that this is even the case if the model class contains only
Bernoulli distributions. We derive a new upper bound on the prediction error
for countable Bernoulli classes. This implies a small bound (comparable to the
one for Bayes mixtures) for certain important model classes. The results apply
to many Machine Learning tasks including classification and hypothesis testing.
We provide arguments that our theorems generalize to countable classes of
i.i.d. models.Comment: 17 page
Asymptotics of Discrete MDL for Online Prediction
Minimum Description Length (MDL) is an important principle for induction and
prediction, with strong relations to optimal Bayesian learning. This paper
deals with learning non-i.i.d. processes by means of two-part MDL, where the
underlying model class is countable. We consider the online learning framework,
i.e. observations come in one by one, and the predictor is allowed to update
his state of mind after each time step. We identify two ways of predicting by
MDL for this setup, namely a static} and a dynamic one. (A third variant,
hybrid MDL, will turn out inferior.) We will prove that under the only
assumption that the data is generated by a distribution contained in the model
class, the MDL predictions converge to the true values almost surely. This is
accomplished by proving finite bounds on the quadratic, the Hellinger, and the
Kullback-Leibler loss of the MDL learner, which are however exponentially worse
than for Bayesian prediction. We demonstrate that these bounds are sharp, even
for model classes containing only Bernoulli distributions. We show how these
bounds imply regret bounds for arbitrary loss functions. Our results apply to a
wide range of setups, namely sequence prediction, pattern classification,
regression, and universal induction in the sense of Algorithmic Information
Theory among others.Comment: 34 page
Sequential Predictions based on Algorithmic Complexity
This paper studies sequence prediction based on the monotone Kolmogorov
complexity Km=-log m, i.e. based on universal deterministic/one-part MDL. m is
extremely close to Solomonoff's universal prior M, the latter being an
excellent predictor in deterministic as well as probabilistic environments,
where performance is measured in terms of convergence of posteriors or losses.
Despite this closeness to M, it is difficult to assess the prediction quality
of m, since little is known about the closeness of their posteriors, which are
the important quantities for prediction. We show that for deterministic
computable environments, the "posterior" and losses of m converge, but rapid
convergence could only be shown on-sequence; the off-sequence convergence can
be slow. In probabilistic environments, neither the posterior nor the losses
converge, in general.Comment: 26 pages, LaTe
On Universal Prediction and Bayesian Confirmation
The Bayesian framework is a well-studied and successful framework for
inductive reasoning, which includes hypothesis testing and confirmation,
parameter estimation, sequence prediction, classification, and regression. But
standard statistical guidelines for choosing the model class and prior are not
always available or fail, in particular in complex situations. Solomonoff
completed the Bayesian framework by providing a rigorous, unique, formal, and
universal choice for the model class and the prior. We discuss in breadth how
and in which sense universal (non-i.i.d.) sequence prediction solves various
(philosophical) problems of traditional Bayesian sequence prediction. We show
that Solomonoff's model possesses many desirable properties: Strong total and
weak instantaneous bounds, and in contrast to most classical continuous prior
densities has no zero p(oste)rior problem, i.e. can confirm universal
hypotheses, is reparametrization and regrouping invariant, and avoids the
old-evidence and updating problem. It even performs well (actually better) in
non-computable environments.Comment: 24 page
Strong Asymptotic Assertions for Discrete MDL in Regression and Classification
We study the properties of the MDL (or maximum penalized complexity)
estimator for Regression and Classification, where the underlying model class
is countable. We show in particular a finite bound on the Hellinger losses
under the only assumption that there is a "true" model contained in the class.
This implies almost sure convergence of the predictive distribution to the true
one at a fast rate. It corresponds to Solomonoff's central theorem of universal
induction, however with a bound that is exponentially larger.Comment: 6 two-column page
An MDL framework for sparse coding and dictionary learning
The power of sparse signal modeling with learned over-complete dictionaries
has been demonstrated in a variety of applications and fields, from signal
processing to statistical inference and machine learning. However, the
statistical properties of these models, such as under-fitting or over-fitting
given sets of data, are still not well characterized in the literature. As a
result, the success of sparse modeling depends on hand-tuning critical
parameters for each data and application. This work aims at addressing this by
providing a practical and objective characterization of sparse models by means
of the Minimum Description Length (MDL) principle -- a well established
information-theoretic approach to model selection in statistical inference. The
resulting framework derives a family of efficient sparse coding and dictionary
learning algorithms which, by virtue of the MDL principle, are completely
parameter free. Furthermore, such framework allows to incorporate additional
prior information to existing models, such as Markovian dependencies, or to
define completely new problem formulations, including in the matrix analysis
area, in a natural way. These virtues will be demonstrated with parameter-free
algorithms for the classic image denoising and classification problems, and for
low-rank matrix recovery in video applications
Minimum Description Length Model Selection - Problems and Extensions
The thesis treats a number of open problems in Minimum Description Length model selection, especially prediction problems. It is shown how techniques from the "Prediction with Expert Advice" literature can be used to improve model selection performance, which is particularly useful in nonparametric settings
- …