Search CORE

31 research outputs found

Suboptimal behavior of Bayes and MDL in classification under misspecification

Author: A. Blumer
A. R. Barron
A. R. Barron
B. Clarke
C. S. Wallace
C. S. Wallace
C. S. Wallace
D. Blackwell
D. Heckerman
J. Quinlan
J. Rissanen
John Langford
K. Yamanishi
M. E. Tipping
M. Kearns
O. Bunke
P. D. Grünwald
P. Diaconis
Peter Grünwald
R. Meir
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It

Author: Grünwald Peter
van Ommen Thijs
Publication venue
Publication date: 01/01/2017
Field of study

We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sample size increases. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the Safe Bayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates as soon the standard posterior is not `cumulatively concentrated', and its results on our data are quite encouraging.Comment: 70 pages, 20 figure

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It

Author: Grünwald P.
van Ommen T.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It

Author: Grünwald P.
van Ommen T.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Suboptimal Behavior of Bayes and MDL in Classification Under Misspecification

Author: A. Blumer
A.R. Barron
A.R. Barron
A.R. Barron
A.R. Barron
D. Blackwell
D. Heckerman
J. Quinlan
J. Rissanen
J.M. Bernardo
K. Yamanishi
M. Kearns
M. Viswanathan
M.E. Tipping
O. Bunke
P. Diaconis
P.D. Grünwald
R. Meir
T.M. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

MDL Convergence Speed for Bernoulli Sequences

Author: A. K. Zvonkin
A. R. Barron
A. R. Barron
B. S. Clarke
J. J. Rissanen
J. J. Rissanen
Jan Poland
L. A. Levin
M. Hutter
M. Hutter
M. Hutter
Marcus Hutter
P. Gács
P. M. Vitányi
R. J. Solomonoff
V. G. Vovk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.Comment: 28 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Hokkaido University Collection of Scholarly and Academic Papers

Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it

Author: Grünwald P.D. (Peter)
Ommen M. (Thijs) van
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2017
Field of study

We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic (though, significantly, there are no outliers). As sample size increases, the posterior puts its mass on worse and worse models of ever higher dimension. This is caused by hypercompression, the phenomenon that the posterior puts its mass on distributions that have much larger KL divergence from the ground truth than their average, i.e. the Bayes predictive distribution. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the SafeBayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates, and regularizes more, as soon as hypercompression takes place. Its results on our data are quite encouraging

CWI's Institutional Repository

Fast rates in statistical and online learning

Author: Grünwald Peter D.
Mehta Nishant A.
Reid Mark D.
van Erven Tim
Williamson Robert C.
Publication venue
Publication date: 01/01/2015
Field of study

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

Asymptotics of Discrete MDL for Online Prediction

Author: Hutter Marcus
Poland Jan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e. observations come in one by one, and the predictor is allowed to update his state of mind after each time step. We identify two ways of predicting by MDL for this setup, namely a static} and a dynamic one. (A third variant, hybrid MDL, will turn out inferior.) We will prove that under the only assumption that the data is generated by a distribution contained in the model class, the MDL predictions converge to the true values almost surely. This is accomplished by proving finite bounds on the quadratic, the Hellinger, and the Kullback-Leibler loss of the MDL learner, which are however exponentially worse than for Bayesian prediction. We demonstrate that these bounds are sharp, even for model classes containing only Bernoulli distributions. We show how these bounds imply regret bounds for arbitrary loss functions. Our results apply to a wide range of setups, namely sequence prediction, pattern classification, regression, and universal induction in the sense of Algorithmic Information Theory among others.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Hokkaido University Collection of Scholarly and Academic Papers