4,955 research outputs found

    Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity

    Get PDF
    The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general we show that data compression is almost always the best strategy, both in hypothesis identification and prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor

    Absolutely No Free Lunches!

    Get PDF
    This paper is concerned with learners who aim to learn patterns in infinite binary sequences: shown longer and longer initial segments of a binary sequence, they either attempt to predict whether the next bit will be a 0 or will be a 1 or they issue forecast probabilities for these events. Several variants of this problem are considered. In each case, a no-free-lunch result of the following form is established: the problem of learning is a formidably difficult one, in that no matter what method is pursued, failure is incomparably more common that success; and difficult choices must be faced in choosing a method of learning, since no approach dominates all others in its range of success. In the simplest case, the comparison of the set of situations in which a method fails and the set of situations in which it succeeds is a matter of cardinality (countable vs. uncountable); in other cases, it is a topological matter (meagre vs. co-meagre) or a hybrid computational-topological matter (effectively meagre vs. effectively co-meagre)

    On an open problem in classification of languages

    Get PDF

    Learning Recursive Functions Refutably

    Get PDF
    Learning of recursive functions refutably means that for every recursive function, the learning machine has either to learn this function or to refute it, i.e., to signal that it is not able to learn it. Three modi of making precise the notion of refuting are considered. We show that the corresponding types of learning refutably are of strictly increasing power, where already the most stringent of them turns out to be of remarkable topological and algorithmical richness. All these types are closed under union, though in different strengths. Also, these types are shown to be different with respect to their intrinsic complexity; two of them do not contain function classes that are “most difficult” to learn, while the third one does. Moreover, we present characterizations for these types of learning refutably. Some of these characterizations make clear where the refuting ability of the corresponding learning machines comes from and how it can be realized, in general. For learning with anomalies refutably, we show that several results from standard learning without refutation stand refutably. Then we derive hierarchies for refutable learning. Finally, we show that stricter refutability constraints cannot be traded for more liberal learning criteria

    On Learning of Functions Refutably

    Get PDF
    Learning of recursive functions refutably informally means that for every recursive function, the learning machine has either to learn this function or to refute it, that is to signal that it is not able to learn it. Three modi of making precise the notion of refuting are considered. We show that the corresponding types of learning refutably are of strictly increasing power, where already the most stringent of them turns out to be of remarkable topological and algorithmical richness. Furthermore, all these types are closed under union, though in different strengths. Also, these types are shown to be different with respect to their intrinsic complexity; two of them do not contain function classes that are “most difficult” to learn, while the third one does. Moreover, we present several characterizations for these types of learning refutably. Some of these characterizations make clear where the refuting ability of the corresponding learning machines comes from and how it can be realized, in general.For learning with anomalies refutably, we show that several results from standard learning without refutation stand refutably. From this we derive some hierarchies for refutable learning. Finally, we prove that in general one cannot trade stricter refutability constraints for more liberal learning criteria

    Inductive reasoning and Kolmogorov complexity

    Get PDF
    AbstractReasoning to obtain the “truth” about reality, from external data, is an important, controversial, and complicated issue in man's effort to understand nature. (Yet, today, we try to make machines do this.) There have been old useful principles, new exciting models, and intricate theories scattered in vastly different areas including philosophy of science, statistics, computer science, and psychology. We focus on inductive reasoning in correspondence with ideas of R. J. Solomonoff. While his proposals result in perfect procedures, they involve the noncomputable notion of Kolmogorov complexity. In this paper we develop the thesis that Solomonoff's method is fundamental in the sense that many other induction principles can be viewed as particular ways to obtain computable approximations to it. We demonstrate this explicitly in the cases of Gold's paradigm for inductive inference, Rissanen's minimum description length (MDL) principle, Fisher's maximum likelihood principle, and Jaynes' maximum entropy principle. We present several new theorems and derivations to this effect. We also delimit what can be learned and what cannot be learned in terms of Kolmogorov complexity, and we describe an experiment in machine learning of handwritten characters. We also give an application of Kolmogorov complexity in Valiant style learning, where we want to learn a concept probably approximately correct in feasible time and examples

    Applying MDL to Learning Best Model Granularity

    Get PDF
    The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies ``Occam's Razor.'' In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To appea

    On the Impact of Forgetting on Learning Machines

    Get PDF
    People tend not to have perfect memories when it comes to learning, or to anything else for that matter. Most formal studies of learning, however, assume a perfect memory. Some approaches have restricted the number of items that could be retained. We introduce a complexity theoretic accounting of memory utilization by learning machines. In our new model, memory is measured in bits as a function of the size of the input. There is a hierarchy of learnability based on increasing memory allotment. The lower bound results are proved using an unusual combination of pumping and mutual recursion theorem arguments. For technical reasons, it was necessary to consider two types of memory: long and short term

    Causal inference using the algorithmic Markov condition

    Full text link
    Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.Comment: 16 figure
    corecore