43,721 research outputs found

    Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

    Full text link
    Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing xtx_t at time tt, given past observations x1...xt−1x_1...x_{t-1} can be computed with the chain rule if the true generating distribution μ\mu of the sequences x1x2x3...x_1x_2x_3... is known. If μ\mu is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture ξ\xi defined as a wνw_\nu-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on ξ\xi is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on μ\mu. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of ξ\xi and give an Occam's razor argument that the choice wν∼2−K(ν)w_\nu\sim 2^{-K(\nu)} for the weights is optimal, where K(ν)K(\nu) is the length of the shortest program describing ν\nu. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

    On Probability and Cosmology: Inference Beyond Data?

    Get PDF
    Modern scientific cosmology pushes the boundaries of knowledge and the knowable. This is prompting questions on the nature of scientific knowledge. A central issue is what defines a 'good' model. When addressing global properties of the Universe or its initial state this becomes a particularly pressing issue. How to assess the probability of the Universe as a whole is empirically ambiguous, since we can examine only part of a single realisation of the system under investigation: at some point, data will run out. We review the basics of applying Bayesian statistical explanation to the Universe as a whole. We argue that a conventional Bayesian approach to model inference generally fails in such circumstances, and cannot resolve, e.g., the so-called 'measure problem' in inflationary cosmology. Implicit and non-empirical valuations inevitably enter model assessment in these cases. This undermines the possibility to perform Bayesian model comparison. One must therefore either stay silent, or pursue a more general form of systematic and rational model assessment. We outline a generalised axiological Bayesian model inference framework, based on mathematical lattices. This extends inference based on empirical data (evidence) to additionally consider the properties of model structure (elegance) and model possibility space (beneficence). We propose this as a natural and theoretically well-motivated framework for introducing an explicit, rational approach to theoretical model prejudice and inference beyond data

    Universality of Bayesian mixture predictors

    Full text link
    The problem is that of sequential probability forecasting for finite-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set C, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured with asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a convex combination of a countably many measures from the set C (a Bayesian mixture). This was previously only known for the case when the best achievable asymptotic error is 0. This also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal, while there is a predictor that achieves the optimal performance

    Approximate Bayesian Model Selection with the Deviance Statistic

    Full text link
    Bayesian model selection poses two main challenges: the specification of parameter priors for all models, and the computation of the resulting Bayes factors between models. There is now a large literature on automatic and objective parameter priors in the linear model. One important class are gg-priors, which were recently extended from linear to generalized linear models (GLMs). We show that the resulting Bayes factors can be approximated by test-based Bayes factors (Johnson [Scand. J. Stat. 35 (2008) 354-368]) using the deviance statistics of the models. To estimate the hyperparameter gg, we propose empirical and fully Bayes approaches and link the former to minimum Bayes factors and shrinkage estimates from the literature. Furthermore, we describe how to approximate the corresponding posterior distribution of the regression coefficients based on the standard GLM output. We illustrate the approach with the development of a clinical prediction model for 30-day survival in the GUSTO-I trial using logistic regression.Comment: Published at http://dx.doi.org/10.1214/14-STS510 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore