772 research outputs found
Shannon Information and Kolmogorov Complexity
We compare the elementary theories of Shannon information and Kolmogorov
complexity, the extent to which they have a common purpose, and where they are
fundamentally different. We discuss and relate the basic notions of both
theories: Shannon entropy versus Kolmogorov complexity, the relation of both to
universal coding, Shannon mutual information versus Kolmogorov (`algorithmic')
mutual information, probabilistic sufficient statistic versus algorithmic
sufficient statistic (related to lossy compression in the Shannon theory versus
meaningful information in the Kolmogorov theory), and rate distortion theory
versus Kolmogorov's structure function. Part of the material has appeared in
print before, scattered through various publications, but this is the first
comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans
Information Theor
The minimum description length principle
The pdf file in the repository consists only if the preface, foreword and chapter 1; I am not allowed by the publisher to put the remainder of this book on the web.
If you are a member of the CWI evaluation committee and yu read this: you are of course entitled to access the full book. If you would like to see it, please contact CWI (or, even easier, contact me directly), and we will be happy to give you a copy of the book for free
Information Symmetries in Irreversible Processes
We study dynamical reversibility in stationary stochastic processes from an
information theoretic perspective. Extending earlier work on the reversibility
of Markov chains, we focus on finitary processes with arbitrarily long
conditional correlations. In particular, we examine stationary processes
represented or generated by edge-emitting, finite-state hidden Markov models.
Surprisingly, we find pervasive temporal asymmetries in the statistics of such
stationary processes with the consequence that the computational resources
necessary to generate a process in the forward and reverse temporal directions
are generally not the same. In fact, an exhaustive survey indicates that most
stationary processes are irreversible. We study the ensuing relations between
model topology in different representations, the process's statistical
properties, and its reversibility in detail. A process's temporal asymmetry is
efficiently captured using two canonical unifilar representations of the
generating model, the forward-time and reverse-time epsilon-machines. We
analyze example irreversible processes whose epsilon-machine presentations
change size under time reversal, including one which has a finite number of
recurrent causal states in one direction, but an infinite number in the
opposite. From the forward-time and reverse-time epsilon-machines, we are able
to construct a symmetrized, but nonunifilar, generator of a process---the
bidirectional machine. Using the bidirectional machine, we show how to directly
calculate a process's fundamental information properties, many of which are
otherwise only poorly approximated via process samples. The tools we introduce
and the insights we offer provide a better understanding of the many facets of
reversibility and irreversibility in stochastic processes.Comment: 32 pages, 17 figures, 2 tables;
http://csc.ucdavis.edu/~cmg/compmech/pubs/pratisp2.ht
Descriptive Complexity Approaches to Inductive Inference
We present a critical review of descriptive complexity approaches to inductive inference. Inductive inference is defined as any process by which a model of the world is formed from observations. The descriptive complexity approach is a formalization of Occam\u27s razor: choose the simplest model consistent with the data. Descriptive complexity as defined by Kolmogorov, Chaitin and Solomonoff is presented as a generalization of Shannon\u27s entropy. We discuss its relationship with randomness and present examples. However, a major result of the theory is negative: descriptive complexity is uncomputable.
Rissanen\u27s minimum description length (MDL) principle is presented as a restricted form of the descriptive complexity which avoids the uncomputability problem. We demonstrate the effectiveness of MDL through its application to AR processes. Lastly, we present and discuss LeClerc\u27s application of MDL to the problem of image segmentation
A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity
We investigate the properties of a Block Decomposition Method (BDM), which extends the power of a Coding Theorem Method (CTM) that approximates local estimations of algorithmic complexity based on Solomonoff–Levin’s theory of algorithmic probability providing a closer connection to algorithmic complexity than previous attempts based on statistical regularities such as popular lossless compression schemes. The strategy behind BDM is to find small computer programs that produce the components of a larger, decomposed object. The set of short computer programs can then be artfully arranged in sequence so as to produce the original object. We show that the method provides efficient estimations of algorithmic complexity but that it performs like Shannon entropy when it loses accuracy. We estimate errors and study the behaviour of BDM for different boundary conditions, all of which are compared and assessed in detail. The measure may be adapted for use with more multi-dimensional objects than strings, objects such as arrays and tensors. To test the measure we demonstrate the power of CTM on low algorithmic-randomness objects that are assigned maximal entropy (e.g., π) but whose numerical approximations are closer to the theoretical low algorithmic-randomness expectation. We also test the measure on larger objects including dual, isomorphic and cospectral graphs for which we know that algorithmic randomness is low. We also release implementations of the methods in most major programming languages—Wolfram Language (Mathematica), Matlab, R, Perl, Python, Pascal, C++, and Haskell—and an online algorithmic complexity calculator.Swedish Research Council (Vetenskapsrådet
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description
length approach is established. We sharpen and clarify the general modeling
principles MDL and MML, abstracted as the ideal MDL principle and defined from
Bayes's rule by means of Kolmogorov complexity. The basic condition under which
the ideal principle should be applied is encapsulated as the Fundamental
Inequality, which in broad terms states that the principle is valid when the
data are random, relative to every contemplated hypothesis and also these
hypotheses are random relative to the (universal) prior. Basically, the ideal
principle states that the prior probability associated with the hypothesis
should be given by the algorithmic universal probability, and the sum of the
log universal probability of the model plus the log of the probability of the
data given the model should be minimized. If we restrict the model class to the
finite sets then application of the ideal principle turns into Kolmogorov's
minimal sufficient statistic. In general we show that data compression is
almost always the best strategy, both in hypothesis identification and
prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor
A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity
We investigate the properties of a Block Decomposition Method (BDM), which extends the power of a Coding Theorem Method (CTM) that approximates local estimations of algorithmic complexity based on Solomonoff–Levin’s theory of algorithmic probability providing a closer connection to algorithmic complexity than previous attempts based on statistical regularities such as popular lossless compression schemes. The strategy behind BDM is to find small computer programs that produce the components of a larger, decomposed object. The set of short computer programs can then be artfully arranged in sequence so as to produce the original object. We show that the method provides efficient estimations of algorithmic complexity but that it performs like Shannon entropy when it loses accuracy. We estimate errors and study the behaviour of BDM for different boundary conditions, all of which are compared and assessed in detail. The measure may be adapted for use with more multi-dimensional objects than strings, objects such as arrays and tensors. To test the measure we demonstrate the power of CTM on low algorithmic-randomness objects that are assigned maximal entropy (e.g., π) but whose numerical approximations are closer to the theoretical low algorithmic-randomness expectation. We also test the measure on larger objects including dual, isomorphic and cospectral graphs for which we know that algorithmic randomness is low. We also release implementations of the methods in most major programming languages—Wolfram Language (Mathematica), Matlab, R, Perl, Python, Pascal, C++, and Haskell—and an online algorithmic complexity calculator.Swedish Research Council (Vetenskapsrådet
- …