16,981 research outputs found
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in particular,
will be studied. The probability of observing at time , given past
observations can be computed with the chain rule if the true
generating distribution of the sequences is known. If
is unknown, but known to belong to a countable or continuous class \M
one can base ones prediction on the Bayes-mixture defined as a
-weighted sum or integral of distributions \nu\in\M. The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that no
other predictor can lead to significantly smaller bounds. Furthermore, for
various performance measures, we show Pareto-optimality of and give an
Occam's razor argument that the choice for the weights
is optimal, where is the length of the shortest program describing
. The results are applied to games of chance, defined as a sequence of
bets, observations, and rewards. The prediction schemes (and bounds) are
compared to the popular predictors based on expert advice. Extensions to
infinite alphabets, partial, delayed and probabilistic prediction,
classification, and more active systems are briefly discussed.Comment: 34 page
On Universal Prediction and Bayesian Confirmation
The Bayesian framework is a well-studied and successful framework for
inductive reasoning, which includes hypothesis testing and confirmation,
parameter estimation, sequence prediction, classification, and regression. But
standard statistical guidelines for choosing the model class and prior are not
always available or fail, in particular in complex situations. Solomonoff
completed the Bayesian framework by providing a rigorous, unique, formal, and
universal choice for the model class and the prior. We discuss in breadth how
and in which sense universal (non-i.i.d.) sequence prediction solves various
(philosophical) problems of traditional Bayesian sequence prediction. We show
that Solomonoff's model possesses many desirable properties: Strong total and
weak instantaneous bounds, and in contrast to most classical continuous prior
densities has no zero p(oste)rior problem, i.e. can confirm universal
hypotheses, is reparametrization and regrouping invariant, and avoids the
old-evidence and updating problem. It even performs well (actually better) in
non-computable environments.Comment: 24 page
Strong Asymptotic Assertions for Discrete MDL in Regression and Classification
We study the properties of the MDL (or maximum penalized complexity)
estimator for Regression and Classification, where the underlying model class
is countable. We show in particular a finite bound on the Hellinger losses
under the only assumption that there is a "true" model contained in the class.
This implies almost sure convergence of the predictive distribution to the true
one at a fast rate. It corresponds to Solomonoff's central theorem of universal
induction, however with a bound that is exponentially larger.Comment: 6 two-column page
Recommended from our members
Universality of Bayesian Predictions
Given the sequential update nature of Bayes rule, Bayesian methods find natural application to prediction problems. Advances in computational methods allow to routinely use Bayesian methods in econometrics. Hence, there is a strong case for feasible predictions in a Bayesian framework. This paper studies the theoretical properties of Bayesian predictions and shows that under minimal conditions we can derive finite sample bounds for the loss incurred using
Bayesian predictions under the Kullback-Leibler divergence. In particular, the concept of universality of predictions is discussed and universality is established for Bayesian predictions in a variety of settings. These include predictions under almost arbitrary loss functions, model
averaging, predictions in a non stationary environment and under model miss-specification.
Given the possibility of regime switches and multiple breaks in economic series, as well as the
need to choose among different forecasting models, which may inevitably be miss-specified, the
finite sample results derived here are of interest to economic and financial forecasting
A Quasi-Bayesian Perspective to Online Clustering
When faced with high frequency streams of data, clustering raises theoretical
and algorithmic pitfalls. We introduce a new and adaptive online clustering
algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e.,
time-dependent) estimation of the (unknown and changing) number of clusters. We
prove that our approach is supported by minimax regret bounds. We also provide
an RJMCMC-flavored implementation (called PACBO, see
https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a
convergence guarantee. Finally, numerical experiments illustrate the potential
of our procedure
Algorithmic Complexity Bounds on Future Prediction Errors
We bound the future loss when predicting any (computably) stochastic sequence
online. Solomonoff finitely bounded the total deviation of his universal
predictor from the true distribution by the algorithmic complexity of
. Here we assume we are at a time and already observed .
We bound the future prediction performance on by a new
variant of algorithmic complexity of given , plus the complexity of the
randomness deficiency of . The new complexity is monotone in its condition
in the sense that this complexity can only decrease if the condition is
prolonged. We also briefly discuss potential generalizations to Bayesian model
classes and to classification problems.Comment: 21 page
Asymptotics of Discrete MDL for Online Prediction
Minimum Description Length (MDL) is an important principle for induction and
prediction, with strong relations to optimal Bayesian learning. This paper
deals with learning non-i.i.d. processes by means of two-part MDL, where the
underlying model class is countable. We consider the online learning framework,
i.e. observations come in one by one, and the predictor is allowed to update
his state of mind after each time step. We identify two ways of predicting by
MDL for this setup, namely a static} and a dynamic one. (A third variant,
hybrid MDL, will turn out inferior.) We will prove that under the only
assumption that the data is generated by a distribution contained in the model
class, the MDL predictions converge to the true values almost surely. This is
accomplished by proving finite bounds on the quadratic, the Hellinger, and the
Kullback-Leibler loss of the MDL learner, which are however exponentially worse
than for Bayesian prediction. We demonstrate that these bounds are sharp, even
for model classes containing only Bernoulli distributions. We show how these
bounds imply regret bounds for arbitrary loss functions. Our results apply to a
wide range of setups, namely sequence prediction, pattern classification,
regression, and universal induction in the sense of Algorithmic Information
Theory among others.Comment: 34 page
- …