187 research outputs found
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description
length approach is established. We sharpen and clarify the general modeling
principles MDL and MML, abstracted as the ideal MDL principle and defined from
Bayes's rule by means of Kolmogorov complexity. The basic condition under which
the ideal principle should be applied is encapsulated as the Fundamental
Inequality, which in broad terms states that the principle is valid when the
data are random, relative to every contemplated hypothesis and also these
hypotheses are random relative to the (universal) prior. Basically, the ideal
principle states that the prior probability associated with the hypothesis
should be given by the algorithmic universal probability, and the sum of the
log universal probability of the model plus the log of the probability of the
data given the model should be minimized. If we restrict the model class to the
finite sets then application of the ideal principle turns into Kolmogorov's
minimal sufficient statistic. In general we show that data compression is
almost always the best strategy, both in hypothesis identification and
prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor
Applying MDL to Learning Best Model Granularity
The Minimum Description Length (MDL) principle is solidly based on a provably
ideal method of inference using Kolmogorov complexity. We test how the theory
behaves in practice on a general problem in model selection: that of learning
the best model granularity. The performance of a model depends critically on
the granularity, for example the choice of precision of the parameters. Too
high precision generally involves modeling of accidental noise and too low
precision may lead to confusion of models that should be distinguished. This
precision is often determined ad hoc. In MDL the best model is the one that
most compresses a two-part code of the data set: this embodies ``Occam's
Razor.'' In two quite different experimental settings the theoretical value
determined using MDL coincides with the best value found experimentally. In the
first experiment the task is to recognize isolated handwritten characters in
one subject's handwriting, irrespective of size and orientation. Based on a new
modification of elastic matching, using multiple prototypes per character, the
optimal prediction rate is predicted for the learned parameter (length of
sampling interval) considered most likely by MDL, which is shown to coincide
with the best value found experimentally. In the second experiment the task is
to model a robot arm with two degrees of freedom using a three layer
feed-forward neural network where we need to determine the number of nodes in
the hidden layer giving best modeling performance. The optimal model (the one
that extrapolizes best on unseen examples) is predicted for the number of nodes
in the hidden layer considered most likely by MDL, which again is found to
coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To
appea
A Philosophical Treatise of Universal Induction
Understanding inductive reasoning is a problem that has engaged mankind for
thousands of years. This problem is relevant to a wide range of fields and is
integral to the philosophy of science. It has been tackled by many great minds
ranging from philosophers to scientists to mathematicians, and more recently
computer scientists. In this article we argue the case for Solomonoff
Induction, a formal inductive framework which combines algorithmic information
theory with the Bayesian framework. Although it achieves excellent theoretical
results and is based on solid philosophical foundations, the requisite
technical knowledge necessary for understanding this framework has caused it to
remain largely unknown and unappreciated in the wider scientific community. The
main contribution of this article is to convey Solomonoff induction and its
related concepts in a generally accessible form with the aim of bridging this
current technical gap. In the process we examine the major historical
contributions that have led to the formulation of Solomonoff Induction as well
as criticisms of Solomonoff and induction in general. In particular we examine
how Solomonoff induction addresses many issues that have plagued other
inductive systems, such as the black ravens paradox and the confirmation
problem, and compare this approach with other recent approaches.Comment: 72 pages, 2 figures, 1 table, LaTe
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series
We show that Kolmogorov complexity and such its estimators as universal codes
(or data compression methods) can be applied for hypotheses testing in a
framework of classical mathematical statistics. The methods for identity
testing and nonparametric testing of serial independence for time series are
suggested.Comment: submitte
A Stochastic Complexity Perspective of Induction in Economics and Inference in Dynamics
Rissanen's fertile and pioneering minimum description length principle (MDL) has been viewed from the point of view of statistical estimation theory, information theory, as stochastic complexity theory -.i.e., a computable approximation to Kolomogorov Complexity - or Solomonoff's recursion theoretic induction principle or as analogous to Kolmogorov's sufficient statistics. All these - and many more - interpretations are valid, interesting and fertile. In this paper I view it from two points of view: those of an algorithmic economist and a dynamical system theorist. >From these points of view I suggest, first, a recasting of Jevons's sceptical vision of induction in the light of MDL; and a complexity interpretation of an undecidable question in dynamics.
Strong Asymptotic Assertions for Discrete MDL in Regression and Classification
We study the properties of the MDL (or maximum penalized complexity)
estimator for Regression and Classification, where the underlying model class
is countable. We show in particular a finite bound on the Hellinger losses
under the only assumption that there is a "true" model contained in the class.
This implies almost sure convergence of the predictive distribution to the true
one at a fast rate. It corresponds to Solomonoff's central theorem of universal
induction, however with a bound that is exponentially larger.Comment: 6 two-column page
m-sophistication
The m-sophistication of a finite binary string x is introduced as a
generalization of some parameter in the proof that complexity of complexity is
rare. A probabilistic near sufficient statistic of x is given which length is
upper bounded by the m-sophistication of x within small additive terms. This
shows that m-sophistication is lower bounded by coarse sophistication and upper
bounded by sophistication within small additive terms. It is also shown that
m-sophistication and coarse sophistication can not be approximated by an upper
or lower semicomputable function, not even within very large error.Comment: 13 pages, draf
- …