24,848 research outputs found

    Algorithmic Statistics

    Full text link
    While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on two-part codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the model-to-data code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes--in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the ``Kolmogorov structure function'' and ``absolutely non-stochastic objects''--those rare objects for which the simplest models that summarize their relevant information (minimal sufficient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones.Comment: LaTeX, 22 pages, 1 figure, with correction to the published journal versio

    Dimension Extractors and Optimal Decompression

    Full text link
    A *dimension extractor* is an algorithm designed to increase the effective dimension -- i.e., the amount of computational randomness -- of an infinite binary sequence, in order to turn a "partially random" sequence into a "more random" sequence. Extractors are exhibited for various effective dimensions, including constructive, computable, space-bounded, time-bounded, and finite-state dimension. Using similar techniques, the Kucera-Gacs theorem is examined from the perspective of decompression, by showing that every infinite sequence S is Turing reducible to a Martin-Loef random sequence R such that the asymptotic number of bits of R needed to compute n bits of S, divided by n, is precisely the constructive dimension of S, which is shown to be the optimal ratio of query bits to computed bits achievable with Turing reductions. The extractors and decompressors that are developed lead directly to new characterizations of some effective dimensions in terms of optimal decompression by Turing reductions.Comment: This report was combined with a different conference paper "Every Sequence is Decompressible from a Random One" (cs.IT/0511074, at http://dx.doi.org/10.1007/11780342_17), and both titles were changed, with the conference paper incorporated as section 5 of this new combined paper. The combined paper was accepted to the journal Theory of Computing Systems, as part of a special issue of invited papers from the second conference on Computability in Europe, 200

    On Macroscopic Complexity and Perceptual Coding

    Full text link
    The theoretical limits of 'lossy' data compression algorithms are considered. The complexity of an object as seen by a macroscopic observer is the size of the perceptual code which discards all information that can be lost without altering the perception of the specified observer. The complexity of this macroscopically observed state is the simplest description of any microstate comprising that macrostate. Inference and pattern recognition based on macrostate rather than microstate complexities will take advantage of the complexity of the macroscopic observer to ignore irrelevant noise

    Active Virtual Network Management Prediction: Complexity as a Framework for Prediction, Optimization, and Assurance

    Full text link
    Research into active networking has provided the incentive to re-visit what has traditionally been classified as distinct properties and characteristics of information transfer such as protocol versus service; at a more fundamental level this paper considers the blending of computation and communication by means of complexity. The specific service examined in this paper is network self-prediction enabled by Active Virtual Network Management Prediction. Computation/communication is analyzed via Kolmogorov Complexity. The result is a mechanism to understand and improve the performance of active networking and Active Virtual Network Management Prediction in particular. The Active Virtual Network Management Prediction mechanism allows information, in various states of algorithmic and static form, to be transported in the service of prediction for network management. The results are generally applicable to algorithmic transmission of information. Kolmogorov Complexity is used and experimentally validated as a theory describing the relationship among algorithmic compression, complexity, and prediction accuracy within an active network. Finally, the paper concludes with a complexity-based framework for Information Assurance that attempts to take a holistic view of vulnerability analysis

    Applying MDL to Learning Best Model Granularity

    Get PDF
    The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies ``Occam's Razor.'' In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To appea