796,217 research outputs found

    Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity

    Get PDF
    The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general we show that data compression is almost always the best strategy, both in hypothesis identification and prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor

    Minimum description complexity

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 136-140).The classical problem of model selection among parametric model sets is considered. The goal is to choose a model set which best represents observed data. The critical task is the choice of a criterion for model set comparison. Pioneer information theoretic based approaches to this problem are Akaike information criterion (AIC) and different forms of minimum description length (MDL). The prior assumption in these methods is that the unknown true model is a member of all the competing sets. We introduce a new method of model selection: minimum description complexity (MDC). The approach is motivated by the Kullback-Leibler information distance. The method suggests choosing the model set for which the model set relative entropy is minimum. We provide a probabilistic method of MDC estimation for a class of parametric model sets. In this calculation the key factor is our prior assumption: unlike the existing methods, no assumption of the true model being a member of the competing model sets is needed. The main strength of the MDC calculation is in its method of extracting information from the observed data.(cont.) Interesting results exhibit the advantages of MDC over MDL and AIC both theoretically and practically. It is illustrated that, under particular conditions, AIC is a special case of MDC. Application of MDC in system identification and signal denoising is investigated. The proposed method answers the challenging question of quality evaluation in identification of stable LTI systems under a fair prior assumption on the unmodeled dynamics. MDC also provides a new solution to a class of denoising problems. We elaborate the theoretical superiority of MDC over the existing thresholding denoising methods.by Soosan Beheshti.Ph.D

    A Short Introduction to Model Selection, Kolmogorov Complexity and Minimum Description Length (MDL)

    Full text link
    The concept of overfitting in model selection is explained and demonstrated with an example. After providing some background information on information theory and Kolmogorov complexity, we provide a short explanation of Minimum Description Length and error minimization. We conclude with a discussion of the typical features of overfitting in model selection.Comment: 20 pages, Chapter 1 of The Paradox of Overfitting, Master's thesis, Rijksuniversiteit Groningen, 200

    Data complexity in machine learning

    Get PDF
    We investigate the role of data complexity in the context of binary classification problems. The universal data complexity is defined for a data set as the Kolmogorov complexity of the mapping enforced by the data set. It is closely related to several existing principles used in machine learning such as Occam's razor, the minimum description length, and the Bayesian approach. The data complexity can also be defined based on a learning model, which is more realistic for applications. We demonstrate the application of the data complexity in two learning problems, data decomposition and data pruning. In data decomposition, we illustrate that a data set is best approximated by its principal subsets which are Pareto optimal with respect to the complexity and the set size. In data pruning, we show that outliers usually have high complexity contributions, and propose methods for estimating the complexity contribution. Since in practice we have to approximate the ideal data complexity measures, we also discuss the impact of such approximations

    A Stochastic Complexity Perspective of Induction in Economics and Inference in Dynamics

    Get PDF
    Rissanen's fertile and pioneering minimum description length principle (MDL) has been viewed from the point of view of statistical estimation theory, information theory, as stochastic complexity theory -.i.e., a computable approximation to Kolomogorov Complexity - or Solomonoff's recursion theoretic induction principle or as analogous to Kolmogorov's sufficient statistics. All these - and many more - interpretations are valid, interesting and fertile. In this paper I view it from two points of view: those of an algorithmic economist and a dynamical system theorist. >From these points of view I suggest, first, a recasting of Jevons's sceptical vision of induction in the light of MDL; and a complexity interpretation of an undecidable question in dynamics.

    Implementation of linear minimum area enclosing traingle algorithm

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.An algorithm which computes the minimum area triangle enclosing a convex polygon in linear time already exists in the literature. The paper describing the algorithm also proves that the provided solution is optimal and a lower complexity sequential algorithm cannot exist. However, only a high-level description of the algorithm was provided, making the implementation difficult to reproduce. The present note aims to contribute to the field by providing a detailed description of the algorithm which is easy to implement and reproduce, and a benchmark comprising 10,000 variable sized, randomly generated convex polygons for illustrating the linearity of the algorithm
    corecore