20,831 research outputs found
Independent minimum length programs to translate between given strings
AbstractA string p is called a program to compute y given x if U(p,x)=y, where U denotes universal programming language. Kolmogorov complexity K(y|x) of y relative to x is defined as minimum length of a program to compute y given x. Let K(x) denote K(x|emptystring) (Kolmogorov complexity of x) and let I(x:y)=K(x)+K(y)−K(〈x,y〉) (the amount of mutual information in x,y). In the present paper, we answer in the negative the following question posed in Bennett et al., IEEE Trans. Inform. Theory 44 (4) (1998) 1407–1423. Is it true that for any strings x,y there are independent minimum length programs p,q to translate between x,y, that is, is it true that for any x,y there are p,q such that U(p,x)=y, U(q,y)=x, the length of p is K(y|x), the length of q is K(x|y), and I(p:q)=0 (where the last three equalities hold up to an additive O(log(K(x|y)+K(y|x))) term)?
Normalized Information Distance
The normalized information distance is a universal distance measure for
objects of all kinds. It is based on Kolmogorov complexity and thus
uncomputable, but there are ways to utilize it. First, compression algorithms
can be used to approximate the Kolmogorov complexity if the objects have a
string representation. Second, for names and abstract concepts, page count
statistics from the World Wide Web can be used. These practical realizations of
the normalized information distance can then be applied to machine learning
tasks, expecially clustering, to perform feature-free and parameter-free data
mining. This chapter discusses the theoretical foundations of the normalized
information distance and both practical realizations. It presents numerous
examples of successful real-world applications based on these distance
measures, ranging from bioinformatics to music clustering to machine
translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in:
Information Theory and Statistical Learning, Eds. M. Dehmer, F.
Emmert-Streib, Springer-Verlag, New-York, To appea
Kolmogorov's Structure Functions and Model Selection
In 1974 Kolmogorov proposed a non-probabilistic approach to statistics and
model selection. Let data be finite binary strings and models be finite sets of
binary strings. Consider model classes consisting of models of given maximal
(Kolmogorov) complexity. The ``structure function'' of the given data expresses
the relation between the complexity level constraint on a model class and the
least log-cardinality of a model in the class containing the data. We show that
the structure function determines all stochastic properties of the data: for
every constrained model class it determines the individual best-fitting model
in the class irrespective of whether the ``true'' model is in the model class
considered or not. In this setting, this happens {\em with certainty}, rather
than with high probability as is in the classical case. We precisely quantify
the goodness-of-fit of an individual model with respect to individual data. We
show that--within the obvious constraints--every graph is realized by the
structure function of some data. We determine the (un)computability properties
of the various functions contemplated and of the ``algorithmic minimal
sufficient statistic.''Comment: 25 pages LaTeX, 5 figures. In part in Proc 47th IEEE FOCS; this final
version (more explanations, cosmetic modifications) to appear in IEEE Trans
Inform T
Causal inference using the algorithmic Markov condition
Inferring the causal structure that links n observables is usually based upon
detecting statistical dependences and choosing simple graphs that make the
joint measure Markovian. Here we argue why causal inference is also possible
when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities
between single objects. To this end, we replace the notion of conditional
stochastic independence in the causal Markov condition with the vanishing of
conditional algorithmic mutual information and describe the corresponding
causal inference rules.
We explain why a consistent reformulation of causal inference in terms of
algorithmic complexity implies a new inference principle that takes into
account also the complexity of conditional probability densities, making it
possible to select among Markov equivalent causal graphs. This insight provides
a theoretical foundation of a heuristic principle proposed in earlier work.
We also discuss how to replace Kolmogorov complexity with decidable
complexity criteria. This can be seen as an algorithmic analog of replacing the
empirically undecidable question of statistical independence with practical
independence tests that are based on implicit or explicit assumptions on the
underlying distribution.Comment: 16 figure
Relating and contrasting plain and prefix Kolmogorov complexity
In [3] a short proof is given that some strings have maximal plain Kolmogorov
complexity but not maximal prefix-free complexity. The proof uses Levin's
symmetry of information, Levin's formula relating plain and prefix complexity
and Gacs' theorem that complexity of complexity given the string can be high.
We argue that the proof technique and results mentioned above are useful to
simplify existing proofs and to solve open questions.
We present a short proof of Solovay's result [21] relating plain and prefix
complexity: and , (here denotes , etc.).
We show that there exist such that is infinite and is
finite, i.e. the infinitely often C-trivial reals are not the same as the
infinitely often K-trivial reals (i.e. [1,Question 1]).
Solovay showed that for infinitely many we have
and , (here
denotes the length of and , etc.). We show that this
result holds for prefixes of some 2-random sequences.
Finally, we generalize our proof technique and show that no monotone relation
exists between expectation and probability bounded randomness deficiency (i.e.
[6, Question 1]).Comment: 20 pages, 1 figur
Strongly universal string hashing is fast
We present fast strongly universal string hashing families: they can process
data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that
these families---though they require a large buffer of random numbers---are
often faster than popular hash functions with weaker theoretical guarantees.
Moreover, conventional wisdom is that hash functions with fewer multiplications
are faster. Yet we find that they may fail to be faster due to operation
pipelining. We present experimental results on several processors including
low-powered processors. Our tests include hash functions designed for
processors with the Carry-Less Multiplication (CLMUL) instruction set. We also
prove, using accessible proofs, the strong universality of our families.Comment: Software is available at
http://code.google.com/p/variablelengthstringhashing/ and
https://github.com/lemire/StronglyUniversalStringHashin
- …