Search CORE

827 research outputs found

On Generalized Computable Universal Priors and their Convergence

Author: Hutter Marcus
Publication venue
Publication date: 11/03/2005
Field of study

Solomonoff unified Occam's razor and Epicurus' principle of multiple explanations to one elegant, formal, universal theory of inductive inference, which initiated the field of algorithmic information theory. His central result is that the posterior of the universal semimeasure M converges rapidly to the true sequence generating posterior mu, if the latter is computable. Hence, M is eligible as a universal predictor in case of unknown mu. The first part of the paper investigates the existence and convergence of computable universal (semi)measures for a hierarchy of computability classes: recursive, estimable, enumerable, and approximable. For instance, M is known to be enumerable, but not estimable, and to dominate all enumerable semimeasures. We present proofs for discrete and continuous semimeasures. The second part investigates more closely the types of convergence, possibly implied by universality: in difference and in ratio, with probability 1, in mean sum, and for Martin-Loef random sequences. We introduce a generalized concept of randomness for individual sequences and use it to exhibit difficulties regarding these issues. In particular, we show that convergence fails (holds) on generalized-random sequences in gappy (dense) Bernoulli classes.Comment: 22 page

arXiv.org e-Print Archive

Elsevier - Publisher Connector

The Australian National University

On Universal Prediction and Bayesian Confirmation

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2007
Field of study

The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. We show that Solomonoff's model possesses many desirable properties: Strong total and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well (actually better) in non-computable environments.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The Australian National University

Sequential Predictions based on Algorithmic Complexity

Author: Hutter Marcus
Publication venue
Publication date: 05/08/2005
Field of study

This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=-log m, i.e. based on universal deterministic/one-part MDL. m is extremely close to Solomonoff's universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the "posterior" and losses of m converge, but rapid convergence could only be shown on-sequence; the off-sequence convergence can be slow. In probabilistic environments, neither the posterior nor the losses converge, in general.Comment: 26 pages, LaTe

arXiv.org e-Print Archive

Elsevier - Publisher Connector

The Australian National University

Algorithmic Thermodynamics

Author: Baez John C.
Stay Mike
Publication venue
Publication date: 01/01/2012
Field of study

Algorithmic entropy can be seen as a special case of entropy as studied in statistical mechanics. This viewpoint allows us to apply many techniques developed for use in thermodynamics to the subject of algorithmic information theory. In particular, suppose we fix a universal prefix-free Turing machine and let X be the set of programs that halt for this machine. Then we can regard X as a set of 'microstates', and treat any function on X as an 'observable'. For any collection of observables, we can study the Gibbs ensemble that maximizes entropy subject to constraints on expected values of these observables. We illustrate this by taking the log runtime, length, and output of a program as observables analogous to the energy E, volume V and number of molecules N in a container of gas. The conjugate variables of these observables allow us to define quantities which we call the 'algorithmic temperature' T, 'algorithmic pressure' P and algorithmic potential' mu, since they are analogous to the temperature, pressure and chemical potential. We derive an analogue of the fundamental thermodynamic relation dE = T dS - P d V + mu dN, and use it to study thermodynamic cycles analogous to those for heat engines. We also investigate the values of T, P and mu for which the partition function converges. At some points on the boundary of this domain of convergence, the partition function becomes uncomputable. Indeed, at these points the partition function itself has nontrivial algorithmic entropy.Comment: 20 pages, one encapsulated postscript figur

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Solomonoff Induction: A Solution to the Problem of the Priors?

Author: Vallinder Aron
Publication venue: Lunds universitet/Teoretisk filosofi
Publication date: 01/01/2012
Field of study

In this essay, I investigate whether Solomonoff’s prior can be used to solve the problem of the priors for Bayesianism. In outline, the idea is to give higher prior probability to hypotheses that are "simpler", where simplicity is given a precise formal deﬁnition. I begin with a review of Bayesianism, including a survey of past proposed solutions of the problem of the priors. I then introduce the formal framework of Solomonoff induction, and go through some of its properties, before ﬁnally turning to some applications. After this, I discuss several potential problems for the framework. Among these are the fact that Solomonoff’s prior is incomputable, that the prior is highly dependent on the choice of a universal Turing machine to use in the deﬁnition, and the fact that it assumes that the hypotheses under consideration are computable. I also discuss whether a bias toward simplicity can be justiﬁed. I argue that there are two main considerations favoring Solomonoff’s prior: (i) it allows us to assign strictly positive probability to every hypothesis in a countably inﬁnite set in a non-arbitrary way, and (ii) it minimizes the number of "retractions" and "errors" in the worst case

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX