304 research outputs found

### Shannon Information and Kolmogorov Complexity

We compare the elementary theories of Shannon information and Kolmogorov
complexity, the extent to which they have a common purpose, and where they are
fundamentally different. We discuss and relate the basic notions of both
theories: Shannon entropy versus Kolmogorov complexity, the relation of both to
universal coding, Shannon mutual information versus Kolmogorov (`algorithmic')
mutual information, probabilistic sufficient statistic versus algorithmic
sufficient statistic (related to lossy compression in the Shannon theory versus
meaningful information in the Kolmogorov theory), and rate distortion theory
versus Kolmogorov's structure function. Part of the material has appeared in
print before, scattered through various publications, but this is the first
comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans
Information Theor

### Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

We study online learning under logarithmic loss with regular parametric
models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction
strategy with Jeffreys prior and sequential normalized maximum likelihood
(SNML) coincide and are optimal if and only if the latter is exchangeable, and
if and only if the optimal strategy can be calculated without knowing the time
horizon in advance. They put forward the question what families have
exchangeable SNML strategies. This paper fully answers this open problem for
one-dimensional exponential families. The exchangeability can happen only for
three classes of natural exponential family distributions, namely the Gaussian,
Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML
Exchangeability, Exponential Family, Online Learning, Logarithmic Loss,
Bayesian Strategy, Jeffreys Prior, Fisher Information1Comment: 23 page

### Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory

We describe and develop a close relationship between two problems that have
customarily been regarded as distinct: that of maximizing entropy, and that of
minimizing worst-case expected loss. Using a formulation grounded in the
equilibrium theory of zero-sum games between Decision Maker and
Nature, these two problems are shown to be dual to each other, the solution
to each providing that to the other. Although Tops\oe described this connection
for the Shannon entropy over 20 years ago, it does not appear to be widely
known even in that important special case. We here generalize this theory to
apply to arbitrary decision problems and loss functions. We indicate how an
appropriate generalized definition of entropy can be associated with such a
problem, and we show that, subject to certain regularity conditions, the
above-mentioned duality continues to apply in this extended context.
This simultaneously provides a possible rationale for maximizing entropy and
a tool for finding robust Bayes acts. We also describe the essential identity
between the problem of maximizing entropy and that of minimizing a related
discrepancy or divergence between distributions. This leads to an extension, to
arbitrary discrepancies, of a well-known minimax theorem for the case of
Kullback-Leibler divergence (the ``redundancy-capacity theorem'' of information
theory). For the important case of families of distributions having certain
mean values specified, we develop simple sufficient conditions and methods for
identifying the desired solutions.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000055

### A tutorial introduction to the minimum description length principle

This tutorial provides an overview of and introduction to Rissanen's Minimum
Description Length (MDL) Principle. The first chapter provides a conceptual,
entirely non-technical introduction to the subject. It serves as a basis for
the technical introduction given in the second chapter, in which all the ideas
of the first chapter are made mathematically precise. The main ideas are
discussed in great conceptual and technical detail. This tutorial is an
extended version of the first two chapters of the collection "Advances in
Minimum Description Length: Theory and Application" (edited by P.Grunwald, I.J.
Myung and M. Pitt, to be published by the MIT Press, Spring 2005).Comment: 80 pages 5 figures Report with 2 chapter

Recommended from our members

### A Recurrent Network that performs a Conext-Sensitive Prediction Task

We address the problem of processing a context-sensitive language with a recurrent neural network (RN). So far, the language processing capabilities of RNs have only been investigated for regular and context-free languages. We present an extremely simple RN with only one parameter z for its two hidden nodes that can perform a prediction task on sequences of symbols from the language {(ba^k)^n" | k >= 0, n > 0}, a language that is context-sensitive but not context-free. The input to the RN consists of any string of the language, one symbol at a time. The network should then, at all times, predict the symbol that should follow. This means that the network must be able to count the number of a's in the first subsequence and to retain this number for future use. We present a value for the parameter z for which our RN can solve the task for k = 1 up to k = 120. As we do not give any method to find a good value for z, this does not say anything about the learning capabilities of our network. It does, however, show that context-sensitive information (the count of a's) can be represented by the network; we analyse in detail how this is done. Hence our work shows that, at least from a representational point of view, connectionist architectures can handle more complex formal languages than was previously known

- â€¦