Bayesian Entropy Estimation for Countable Discrete Distributions

Archer, Evan; Park, Il Memming; Pillow, Jonathan

research

Bayesian Entropy Estimation for Countable Discrete Distributions

Authors: Evan Archer
Il Memming Park
Jonathan Pillow
Publication date: 9 April 2014
Publisher

Abstract

We consider the problem of estimating Shannon's entropy

H

from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over

H

can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over

H

, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over

H

. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.Comment: 38 pages LaTeX. Revised and resubmitted to JML

Similar works

Full text

Available Versions

MPG.PuRe

oai:escidoc.org:escidoc:216077...

Last time updated on 23/08/2016

CiteSeerX

oai:CiteSeerX.psu:10.1.1.714.3...

Last time updated on 29/10/2017

MPG.PuRe

oai:pure.mpg.de:item_2160778

Last time updated on 15/06/2019