Search CORE

120 research outputs found

Algorithmic Statistics

Author: Gacs Peter
Tromp John
Vitanyi Paul
Publication venue
Publication date: 01/01/2001
Field of study

While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on two-part codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the model-to-data code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes--in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the ``Kolmogorov structure function'' and ``absolutely non-stochastic objects''--those rare objects for which the simplest models that summarize their relevant information (minimal sufficient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones.Comment: LaTeX, 22 pages, 1 figure, with correction to the published journal versio

arXiv.org e-Print Archive

CiteSeerX

Boston University Institutional Repository (OpenBU)

Algorithmic statistics revisited

Author: A Romashchenko
A Shen
AA Muchnik
AA Muchnik
D Hammer
GJ Chaitin
J Rissanen
L Antunes
LA Levin
M Koppel
M Li
N Vereshchagin
N Vereshchagin
NK Vereshchagin
VV V’yugin
VV V’yugin
Publication venue
Publication date: 01/01/2015
Field of study

The mission of statistics is to provide adequate statistical hypotheses (models) for observed data. But what is an "adequate" model? To answer this question, one needs to use the notions of algorithmic information theory. It turns out that for every data string

x

one can naturally define "stochasticity profile", a curve that represents a trade-off between complexity of a model and its adequacy. This curve has four different equivalent definitions in terms of (1)~randomness deficiency, (2)~minimal description length, (3)~position in the lists of simple strings and (4)~Kolmogorov complexity with decompression time bounded by busy beaver function. We present a survey of the corresponding definitions and results relating them to each other

arXiv.org e-Print Archive

Crossref

HAL Descartes

Hal-Diderot

Algorithmic statistics: forty years later

Author: A Milovanov
A Milovanov
A Shen
A Shen
A Shen
AN Kolmogorov
CH Bennett
CS Wallace
F Mota
J Rissanen
L Antunes
L Antunes
L Bienvenu
L Bienvenu
L Levin
M Koppel
M Koppel
M Li
N Vereshchagin
N Vereshchagin
N Vereshchagin
NK Vereshchagin
NK Vereshchagin
P Gács
P Gács
PMB Vitányi
R Solomonoff
R Solomonoff
S Rooij de
T Cover
VV V’yugin
VV V’yugin
VV V’yugin
Publication venue
Publication date: 07/03/2017
Field of study

Algorithmic statistics has two different (and almost orthogonal) motivations. From the philosophical point of view, it tries to formalize how the statistics works and why some statistical models are better than others. After this notion of a "good model" is introduced, a natural question arises: it is possible that for some piece of data there is no good model? If yes, how often these bad ("non-stochastic") data appear "in real life"? Another, more technical motivation comes from algorithmic information theory. In this theory a notion of complexity of a finite object (=amount of information in this object) is introduced; it assigns to every object some number, called its algorithmic complexity (or Kolmogorov complexity). Algorithmic statistic provides a more fine-grained classification: for each finite object some curve is defined that characterizes its behavior. It turns out that several different definitions give (approximately) the same curve. In this survey we try to provide an exposition of the main results in the field (including full proofs for the most important ones), as well as some historical comments. We assume that the reader is familiar with the main notions of algorithmic information (Kolmogorov complexity) theory.Comment: Missing proofs adde

arXiv.org e-Print Archive

Crossref

On Algorithmic Statistics for space-bounded algorithms

Author: A Shen
D Musatov
H Buhrman
M Furst
N Nisan
N Nisan
N Nisan
NK Vereshchagin
P Li
Publication venue
Publication date: 13/07/2017
Field of study

Algorithmic statistics studies explanations of observed data that are good in the algorithmic sense: an explanation should be simple i.e. should have small Kolmogorov complexity and capture all the algorithmically discoverable regularities in the data. However this idea can not be used in practice because Kolmogorov complexity is not computable. In this paper we develop algorithmic statistics using space-bounded Kolmogorov complexity. We prove an analogue of one of the main result of `classic' algorithmic statistics (about the connection between optimality and randomness deficiences). The main tool of our proof is the Nisan-Wigderson generator.Comment: accepted to CSR 2017 conferenc

arXiv.org e-Print Archive

Crossref

Algorithmic statistics, prediction and machine learning

Author: Milovanov Alexey
Publication venue
Publication date: 17/09/2015
Field of study

Algorithmic statistics considers the following problem: given a binary string

x

(e.g., some experimental data), find a "good" explanation of this data. It uses algorithmic information theory to define formally what is a good explanation. In this paper we extend this framework in two directions. First, the explanations are not only interesting in themselves but also used for prediction: we want to know what kind of data we may reasonably expect in similar situations (repeating the same experiment). We show that some kind of hierarchy can be constructed both in terms of algorithmic statistics and using the notion of a priori probability, and these two approaches turn out to be equivalent. Second, a more realistic approach that goes back to machine learning theory, assumes that we have not a single data string

x

but some set of "positive examples"

x_1,\ldots,x_l

that all belong to some unknown set

A

, a property that we want to learn. We want this set

A

to contain all positive examples and to be as small and simple as possible. We show how algorithmic statistic can be extended to cover this situation.Comment: 22 page

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Predictions and algorithmic statistics for infinite sequence

Author: Milovanov Alexey
Publication venue
Publication date: 29/05/2020
Field of study

Consider the following prediction problem. Assume that there is a block box that produces bits according to some unknown computable distribution on the binary tree. We know first

n

bits

x_1 x_2 \ldots x_n

. We want to know the probability of the event that that the next bit is equal to

1

. Solomonoff suggested to use universal semimeasure

m

for solving this task. He proved that for every computable distribution

P

and for every

b \in \{0,1\}

the following holds:

\sum_{n=1}^{\infty}\sum_{x: l(x)=n} P(x) (P(b | x) - m(b | x))^2 < \infty\ .

However, Solomonoff's method has a negative aspect: Hutter and Muchnik proved that there are an universal semimeasure

m

, computable distribution

P

and a random (in Martin-L{\"o}f sense) sequence

x_1 x_2\ldots

such that

\lim_{n \to \infty} P(x_{n+1} | x_1\ldots x_n) - m(x_{n+1} | x_1\ldots x_n) \nrightarrow 0

. We suggest a new way for prediction. For every finite string

x

we predict the new bit according to the best (in some sence) distribution for

x

. We prove the similar result as Solomonoff theorem for our way of prediction. Also we show that our method of prediction has no that negative aspect as Solomonoff's method.Comment: 12 page

arXiv.org e-Print Archive

Stochasticity in Algorithmic Statistics for Polynomial Time

Author: Milovanov Alexey
Vereshchagin Nikolay
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 32nd Computational Complexity Conference (CCC 2017)
Publication date: 01/01/2017
Field of study

A fundamental notion in Algorithmic Statistics is that of a stochastic object, i.e., an object having a simple plausible explanation. Informally, a probability distribution is a plausible explanation for x if it looks likely that x was drawn at random with respect to that distribution. In this paper, we suggest three definitions of a plausible statistical hypothesis for Algorithmic Statistics with polynomial time bounds, which are called acceptability, plausibility and optimality. Roughly speaking, a probability distribution m is called an acceptable explanation for x, if x possesses all properties decidable by short programs in a short time and shared by almost all objects (with respect to m). Plausibility is a similar notion, however this time we require x to possess all properties T decidable even by long programs in a short time and shared by almost all objects. To compensate the increase in program length, we strengthen the notion of `almost all\u27 - the longer the program recognizing the property is, the more objects must share the property. Finally, a probability distribution m is called an optimal explanation for x if m(x) is large. Almost all our results hold under some plausible complexity theoretic assumptions. Our main result states that for acceptability and plausibility there are infinitely many non-stochastic objects, i.e. objects that do not have simple plausible (acceptable) explanations. Using the same techniques, we show that the distinguishing complexity of a string x can be super-logarithmically less than the conditional complexity of x with condition r for almost all r (for polynomial time bounded programs). Finally, we study relationships between the introduced notions

Dagstuhl Research Online Publication Server

Effective complexity of stationary process realizations

Author: Arleta Szkoła
Bennett
Brudno
Cover
Gell-Mann
Koppel
Li
Markus Müller
Nihat Ay
Rissanen
Publication venue: 'MDPI AG'
Publication date: 01/01/2011
Field of study

The concept of effective complexity of an object as the minimal description length of its regularities has been initiated by Gell-Mann and Lloyd. The regularities are modeled by means of ensembles, that is probability distributions on finite binary strings. In our previous paper we propose a definition of effective complexity in precise terms of algorithmic information theory. Here we investigate the effective complexity of binary strings generated by stationary, in general not computable, processes. We show that under not too strong conditions long typical process realizations are effectively simple. Our results become most transparent in the context of coarse effective complexity which is a modification of the original notion of effective complexity that uses less parameters in its definition. A similar modification of the related concept of sophistication has been suggested by Antunes and Fortnow.Comment: 14 pages, no figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

An Extended Coding Theorem with Application to Quantum Complexities

Author: Epstein Samuel
Publication venue
Publication date: 01/12/2020
Field of study

This paper introduces a new inequality in algorithmic information theory that can be seen as an extended coding theorem. This inequality has applications in new bounds between quantum complexity measures.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

Hal-Diderot