Search CORE

539 research outputs found

Sequential Predictions based on Algorithmic Complexity

Author: Hutter Marcus
Publication venue
Publication date: 05/08/2005
Field of study

This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=-log m, i.e. based on universal deterministic/one-part MDL. m is extremely close to Solomonoff's universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the "posterior" and losses of m converge, but rapid convergence could only be shown on-sequence; the off-sequence convergence can be slow. In probabilistic environments, neither the posterior nor the losses converge, in general.Comment: 26 pages, LaTe

arXiv.org e-Print Archive

The Australian National University

MDL Convergence Speed for Bernoulli Sequences

Author: A. K. Zvonkin
A. R. Barron
A. R. Barron
B. S. Clarke
J. J. Rissanen
J. J. Rissanen
Jan Poland
L. A. Levin
M. Hutter
M. Hutter
M. Hutter
Marcus Hutter
P. Gács
P. M. Vitányi
R. J. Solomonoff
V. G. Vovk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.Comment: 28 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

Author: Jan Pol
Jan Pol
Marcus Hutter
Marcus Hutter
Publication venue
Publication date: 01/01/2004
Field of study

We consider the Minimum Description Length principle for online sequence prediction. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is bounded, implying convergence with probability one, and (b) it additionally specifies a `rate of convergence'. Generally, for MDL only exponential loss bounds hold, as opposed to the linear bounds for a Bayes mixture. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. The results apply to many Machine Learning tasks including classification and hypothesis testing. We provide arguments that our theorems generalize to countable classes of i.i.d. models.Comment: 17 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Causality - Complexity - Consistency: Can Space-Time Be Based on Logic and Computation?

Author: A. Fine
A. Stefanov
A.N. Kolmogorov
B. Russell
C. Wood
C.H. Bennett
C.H. Bennett
E. Fredkin
E. Specker
G. Chaitin
H. Everett
J. Barrett
J. Woodward
J. Ziv
J.-D. Bancal
J.S. Bell
L. Szilárd
M. Li
O. Dahlsten
O. Oreshkov
P. Gàcs
P.K. Aravind
R. Cilibrasi
R. Colbeck
R. Colbeck
R. Raz
S. Coretti
S. Popescu
S.C. Kleene
T.E. Stuart
T.J. Barnea
W.H. Zurek
Ä. Baumeler
Ä. Baumeler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/01/2018
Field of study

The difficulty of explaining non-local correlations in a fixed causal structure sheds new light on the old debate on whether space and time are to be seen as fundamental. Refraining from assuming space-time as given a priori has a number of consequences. First, the usual definitions of randomness depend on a causal structure and turn meaningless. So motivated, we propose an intrinsic, physically motivated measure for the randomness of a string of bits: its length minus its normalized work value, a quantity we closely relate to its Kolmogorov complexity (the length of the shortest program making a universal Turing machine output this string). We test this alternative concept of randomness for the example of non-local correlations, and we end up with a reasoning that leads to similar conclusions as in, but is conceptually more direct than, the probabilistic view since only the outcomes of measurements that can actually all be carried out together are put into relation to each other. In the same context-free spirit, we connect the logical reversibility of an evolution to the second law of thermodynamics and the arrow of time. Refining this, we end up with a speculation on the emergence of a space-time structure on bit strings in terms of data-compressibility relations. Finally, we show that logical consistency, by which we replace the abandoned causality, it strictly weaker a constraint than the latter in the multi-party case.Comment: 17 pages, 16 figures, small correction

arXiv.org e-Print Archive

On Universal Prediction and Bayesian Confirmation

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2007
Field of study

The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. We show that Solomonoff's model possesses many desirable properties: Strong total and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well (actually better) in non-computable environments.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Generalised Entropies and Asymptotic Complexities of Languages

Author: Kalnishkan Yuri
Vovk Vladimir
Vyugin M V
Publication venue: 'Elsevier BV'
Publication date: 01/10/2014
Field of study

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX