Search CORE

36,169 research outputs found

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

On Universal Prediction and Bayesian Confirmation

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2007
Field of study

The Bayesian framework is a well-studied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. We discuss in breadth how and in which sense universal (non-i.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. We show that Solomonoff's model possesses many desirable properties: Strong total and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the old-evidence and updating problem. It even performs well (actually better) in non-computable environments.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The Australian National University

Sequential Predictions based on Algorithmic Complexity

Author: Hutter Marcus
Publication venue
Publication date: 05/08/2005
Field of study

This paper studies sequence prediction based on the monotone Kolmogorov complexity Km=-log m, i.e. based on universal deterministic/one-part MDL. m is extremely close to Solomonoff's universal prior M, the latter being an excellent predictor in deterministic as well as probabilistic environments, where performance is measured in terms of convergence of posteriors or losses. Despite this closeness to M, it is difficult to assess the prediction quality of m, since little is known about the closeness of their posteriors, which are the important quantities for prediction. We show that for deterministic computable environments, the "posterior" and losses of m converge, but rapid convergence could only be shown on-sequence; the off-sequence convergence can be slow. In probabilistic environments, neither the posterior nor the losses converge, in general.Comment: 26 pages, LaTe

arXiv.org e-Print Archive

Elsevier - Publisher Connector

The Australian National University

Ultimate Intelligence Part I: Physical Completeness and Objectivity of Induction

Author: AD Tubbs
CH Bennett
D Deutsch
I Wood
M Hutter
P Sunehag
PH Frampton
RJ Solomonoff
RJ Solomonoff
RJ Solomonoff
RJ Solomonoff
S Lloyd
S Lloyd
Publication venue
Publication date: 09/04/2015
Field of study

We propose that Solomonoff induction is complete in the physical sense via several strong physical arguments. We also argue that Solomonoff induction is fully applicable to quantum mechanics. We show how to choose an objective reference machine for universal induction by defining a physical message complexity and physical message probability, and argue that this choice dissolves some well-known objections to universal induction. We also introduce many more variants of physical message complexity based on energy and action, and discuss the ramifications of our proposals.Comment: Under review at AGI-2015 conference. An early draft was submitted to ALT-2014. This paper is now being split into two papers, one philosophical, and one more technical. We intend that all installments of the paper series will be on the arxi

arXiv.org e-Print Archive

Crossref

Kolmogorov's Structure Functions and Model Selection

Author: Vereshchagin Nikolai
Vitanyi Paul
Publication venue
Publication date: 01/01/2004
Field of study

In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The ``structure function'' of the given data expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the ``true'' model is in the model class considered or not. In this setting, this happens {\em with certainty}, rather than with high probability as is in the classical case. We precisely quantify the goodness-of-fit of an individual model with respect to individual data. We show that--within the obvious constraints--every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the ``algorithmic minimal sufficient statistic.''Comment: 25 pages LaTeX, 5 figures. In part in Proc 47th IEEE FOCS; this final version (more explanations, cosmetic modifications) to appear in IEEE Trans Inform T

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

New error bounds for Solomonoff prediction

Author: Hutter Marcus
Publication venue: 'Elsevier BV'
Publication date: 01/06/2001
Field of study

Solomonoff sequence prediction is a scheme to predict digits of binary strings without knowing the underlying probability distribution. We call a prediction scheme informed when it knows the true probability distribution of the sequence. Several new relations between universal Solomonoff sequence prediction and informed prediction and general probabilistic prediction schemes will be proved. Among others, they show that the number of errors in Solomonoff prediction is finite for computable distributions, if finite in the informed case. Deterministic variants will also be studied. The most interesting result is that the deterministic variant of Solomonoff prediction is optimal compared to any other probabilistic or deterministic prediction scheme apart from additive square root corrections only. This makes it well suited even for difficult prediction problems, where it does not suffice when the number of errors is minimal to within some factor greater than one. Solomonoff's original bound and the ones presented here complement each other in a useful way

Elsevier - Publisher Connector

The Australian National University

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle

t

action

y_t

results in perception

x_t

and reward

r_t

, where all quantities in general may depend on the complete history. The perception

x_t

and reward

r_t

are sampled from the (reactive) environmental probability distribution

\mu

. This very general setting includes, but is not limited to, (partial observable, k-th order) Markov decision processes. Sequential decision theory tells us how to act in order to maximize the total expected reward, called value, if

\mu

is known. Reinforcement learning is usually used if

\mu

is unknown. In the Bayesian approach one defines a mixture distribution

\xi

as a weighted sum of distributions \nu\in\M, where \M is any class of distributions including the true environment

\mu

. We show that the Bayes-optimal policy

p^\xi

based on the mixture

\xi

is self-optimizing in the sense that the average value converges asymptotically for all \mu\in\M to the optimal value achieved by the (infeasible) Bayes-optimal policy

p^\mu

which knows

\mu

in advance. We show that the necessary condition that \M admits self-optimizing policies at all, is also sufficient. No other structural assumptions are made on \M. As an example application, we discuss ergodic Markov decision processes, which allow for self-optimizing policies. Furthermore, we show that

p^\xi

is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in {\em all} environments \nu\in\M and a strictly higher value in at least one.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Gaussian process domain experts for model adaptation in facial behavior analysis

Author: Deisenroth MP
Eleftheriadis S
Pantic M
Rudovic O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/04/2016
Field of study

We present a novel approach for supervised domain adaptation that is based upon the probabilistic framework of Gaussian processes (GPs). Specifically, we introduce domain-specific GPs as local experts for facial expression classification from face images. The adaptation of the classifier is facilitated in probabilistic fashion by conditioning the target expert on multiple source experts. Furthermore, in contrast to existing adaptation approaches, we also learn a target expert from available target data solely. Then, a single and confident classifier is obtained by combining the predictions from multiple experts based on their confidence. Learning of the model is efficient and requires no retraining/reweighting of the source classifiers. We evaluate the proposed approach on two publicly available datasets for multi-class (MultiPIE) and multi-label (DISFA) facial expression classification. To this end, we perform adaptation of two contextual factors: where (view) and who (subject). We show in our experiments that the proposed approach consistently outperforms both source and target classifiers, while using as few as 30 target examples. It also outperforms the state-of-the-art approaches for supervised domain adaptation

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

University of Twente Research Information