20,831 research outputs found

    Independent minimum length programs to translate between given strings

    Get PDF
    AbstractA string p is called a program to compute y given x if U(p,x)=y, where U denotes universal programming language. Kolmogorov complexity K(y|x) of y relative to x is defined as minimum length of a program to compute y given x. Let K(x) denote K(x|emptystring) (Kolmogorov complexity of x) and let I(x:y)=K(x)+K(y)−K(〈x,y〉) (the amount of mutual information in x,y). In the present paper, we answer in the negative the following question posed in Bennett et al., IEEE Trans. Inform. Theory 44 (4) (1998) 1407–1423. Is it true that for any strings x,y there are independent minimum length programs p,q to translate between x,y, that is, is it true that for any x,y there are p,q such that U(p,x)=y, U(q,y)=x, the length of p is K(y|x), the length of q is K(x|y), and I(p:q)=0 (where the last three equalities hold up to an additive O(log(K(x|y)+K(y|x))) term)?

    Normalized Information Distance

    Get PDF
    The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appea

    Kolmogorov's Structure Functions and Model Selection

    Full text link
    In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The ``structure function'' of the given data expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. We show that the structure function determines all stochastic properties of the data: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the ``true'' model is in the model class considered or not. In this setting, this happens {\em with certainty}, rather than with high probability as is in the classical case. We precisely quantify the goodness-of-fit of an individual model with respect to individual data. We show that--within the obvious constraints--every graph is realized by the structure function of some data. We determine the (un)computability properties of the various functions contemplated and of the ``algorithmic minimal sufficient statistic.''Comment: 25 pages LaTeX, 5 figures. In part in Proc 47th IEEE FOCS; this final version (more explanations, cosmetic modifications) to appear in IEEE Trans Inform T

    Causal inference using the algorithmic Markov condition

    Full text link
    Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.Comment: 16 figure

    Relating and contrasting plain and prefix Kolmogorov complexity

    Get PDF
    In [3] a short proof is given that some strings have maximal plain Kolmogorov complexity but not maximal prefix-free complexity. The proof uses Levin's symmetry of information, Levin's formula relating plain and prefix complexity and Gacs' theorem that complexity of complexity given the string can be high. We argue that the proof technique and results mentioned above are useful to simplify existing proofs and to solve open questions. We present a short proof of Solovay's result [21] relating plain and prefix complexity: K(x)=C(x)+CC(x)+O(CCC(x))K (x) = C (x) + CC (x) + O(CCC (x)) and C(x)=K(x)KK(x)+O(KKK(x))C (x) = K (x) - KK (x) + O(KKK (x)), (here CC(x)CC(x) denotes C(C(x))C(C(x)), etc.). We show that there exist ω\omega such that lim infC(ω1ωn)C(n)\liminf C(\omega_1\dots \omega_n) - C(n) is infinite and lim infK(ω1ωn)K(n)\liminf K(\omega_1\dots \omega_n) - K(n) is finite, i.e. the infinitely often C-trivial reals are not the same as the infinitely often K-trivial reals (i.e. [1,Question 1]). Solovay showed that for infinitely many xx we have xC(x)O(1)|x| - C (x) \le O(1) and x+K(x)K(x)log(2)xO(log(3)x)|x| + K (|x|) - K (x) \ge \log^{(2)} |x| - O(\log^{(3)} |x|), (here x|x| denotes the length of xx and log(2)=loglog\log^{(2)} = \log\log, etc.). We show that this result holds for prefixes of some 2-random sequences. Finally, we generalize our proof technique and show that no monotone relation exists between expectation and probability bounded randomness deficiency (i.e. [6, Question 1]).Comment: 20 pages, 1 figur

    Strongly universal string hashing is fast

    Get PDF
    We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.Comment: Software is available at http://code.google.com/p/variablelengthstringhashing/ and https://github.com/lemire/StronglyUniversalStringHashin