3,194 research outputs found
A Computable Measure of Algorithmic Probability by Finite Approximations with an Application to Integer Sequences
Given the widespread use of lossless compression algorithms to approximate
algorithmic (Kolmogorov-Chaitin) complexity, and that lossless compression
algorithms fall short at characterizing patterns other than statistical ones
not different to entropy estimations, here we explore an alternative and
complementary approach. We study formal properties of a Levin-inspired measure
calculated from the output distribution of small Turing machines. We
introduce and justify finite approximations that have been used in some
applications as an alternative to lossless compression algorithms for
approximating algorithmic (Kolmogorov-Chaitin) complexity. We provide proofs of
the relevant properties of both and and compare them to Levin's
Universal Distribution. We provide error estimations of with respect to
. Finally, we present an application to integer sequences from the Online
Encyclopedia of Integer Sequences which suggests that our AP-based measures may
characterize non-statistical patterns, and we report interesting correlations
with textual, function and program description lengths of the said sequences.Comment: As accepted by the journal Complexity (Wiley/Hindawi
The Missing Mass Problem
We give tight lower and upper bounds on the expected missing mass for
distributions over finite and countably infinite spaces. An essential
characterization of the extremal distributions is given. We also provide an
extension to totally bounded metric spaces that may be of independent interest.Comment: 15 page
Bayesian Entropy Estimation for Countable Discrete Distributions
We consider the problem of estimating Shannon's entropy from discrete
data, in cases where the number of possible symbols is unknown or even
countably infinite. The Pitman-Yor process, a generalization of Dirichlet
process, provides a tractable prior distribution over the space of countably
infinite discrete distributions, and has found major applications in Bayesian
non-parametric statistics and machine learning. Here we show that it also
provides a natural family of priors for Bayesian entropy estimation, due to the
fact that moments of the induced posterior distribution over can be
computed analytically. We derive formulas for the posterior mean (Bayes' least
squares estimate) and variance under Dirichlet and Pitman-Yor process priors.
Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a
narrow prior distribution over , meaning the prior strongly determines the
entropy estimate in the under-sampled regime. We derive a family of continuous
mixing measures such that the resulting mixture of Pitman-Yor processes
produces an approximately flat prior over . We show that the resulting
Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of
distributions. We explore the theoretical properties of the resulting
estimator, and show that it performs well both in simulation and in application
to real data.Comment: 38 pages LaTeX. Revised and resubmitted to JML
Rare Probability Estimation under Regularly Varying Heavy Tails
This paper studies the problem of estimating the probability of symbols that have occurred very rarely, in samples drawn independently from an unknown, possibly infinite, discrete distribution. In particular, we study the multiplicative consistency of estimators, defined as the ratio of the estimate to the true quantity converging to one. We first show that the classical Good-Turing estimator is not universally consistent in this sense, despite enjoying favorable additive properties. We then use Karamata's theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency. At the core of this result is a multiplicative concentration that we establish both by extending the McAllester-Ortiz additive concentration for the missing mass to all rare probabilities and by exploiting regular variation. We also derive a family of estimators which, in addition to being consistent, address some of the shortcomings of the Good-Turing estimator. For example, they perform smoothing implicitly and have the absolute discounting structure of many heuristic algorithms. This also establishes a discrete parallel to extreme value theory, and many of the techniques therein can be adapted to the framework that we set forth.National Science Foundation (U.S.) (Grant 6922470)United States. Office of Naval Research (Grant 6918937
- …