76 research outputs found
A tutorial introduction to the minimum description length principle
This tutorial provides an overview of and introduction to Rissanen's Minimum
Description Length (MDL) Principle. The first chapter provides a conceptual,
entirely non-technical introduction to the subject. It serves as a basis for
the technical introduction given in the second chapter, in which all the ideas
of the first chapter are made mathematically precise. The main ideas are
discussed in great conceptual and technical detail. This tutorial is an
extended version of the first two chapters of the collection "Advances in
Minimum Description Length: Theory and Application" (edited by P.Grunwald, I.J.
Myung and M. Pitt, to be published by the MIT Press, Spring 2005).Comment: 80 pages 5 figures Report with 2 chapter
Minimum Description Length Revisited
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective
Minimum Description Length Model Selection - Problems and Extensions
The thesis treats a number of open problems in Minimum Description Length model selection, especially prediction problems. It is shown how techniques from the "Prediction with Expert Advice" literature can be used to improve model selection performance, which is particularly useful in nonparametric settings
Minimum description length revisited
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of MDL estimators. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC versus BIC and cross-validation versus Bayes can, to a large extent, be viewed from a unified perspective.Peer reviewe
The minimum description length principle
The pdf file in the repository consists only if the preface, foreword and chapter 1; I am not allowed by the publisher to put the remainder of this book on the web.
If you are a member of the CWI evaluation committee and yu read this: you are of course entitled to access the full book. If you would like to see it, please contact CWI (or, even easier, contact me directly), and we will be happy to give you a copy of the book for free
Almost the Best of Three Worlds: Risk, Consistency and Optional Stopping for the Switch Criterion in Nested Model Selection
We study the switch distribution, introduced by Van Erven et al. (2012),
applied to model selection and subsequent estimation. While switching was known
to be strongly consistent, here we show that it achieves minimax optimal
parametric risk rates up to a factor when comparing two nested
exponential families, partially confirming a conjecture by Lauritzen (2012) and
Cavanaugh (2012) that switching behaves asymptotically like the Hannan-Quinn
criterion. Moreover, like Bayes factor model selection but unlike standard
significance testing, when one of the models represents a simple hypothesis,
the switch criterion defines a robust null hypothesis test, meaning that its
Type-I error probability can be bounded irrespective of the stopping rule.
Hence, switching is consistent, insensitive to optional stopping and almost
minimax risk optimal, showing that, Yang's (2005) impossibility result
notwithstanding, it is possible to `almost' combine the strengths of AIC and
Bayes factor model selection.Comment: To appear in Statistica Sinic
Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma
Bayesian model averaging, model selection and its approximations such as BIC
are generally statistically consistent, but sometimes achieve slower rates og
convergence than other methods such as AIC and leave-one-out cross-validation.
On the other hand, these other methods can br inconsistent. We identify the
"catch-up phenomenon" as a novel explanation for the slow convergence of
Bayesian methods. Based on this analysis we define the switch distribution, a
modification of the Bayesian marginal distribution. We show that, under broad
conditions,model selection and prediction based on the switch distribution is
both consistent and achieves optimal convergence rates, thereby resolving the
AIC-BIC dilemma. The method is practical; we give an efficient implementation.
The switch distribution has a data compression interpretation, and can thus be
viewed as a "prequential" or MDL method; yet it is different from the MDL
methods that are usually considered in the literature. We compare the switch
distribution to Bayes factor model selection and leave-one-out
cross-validation.Comment: A preliminary version of a part of this paper appeared at the NIPS
2007 conferenc
- …