24,097 research outputs found

    A Generalization of the Dirichlet Distribution

    Get PDF
    This paper discusses a generalization of the Dirichlet distribution, the 'hyperdirichlet', in which various types of incomplete observations may be incorporated. It is conjugate to the multinomial distribution when some observations are censored or grouped. The hyperdirichlet R package is introduced and examples given. A number of statistical tests are performed on the example datasets, which are drawn from diverse disciplines including sports statistics, the sociology of climate change, and psephology

    Bayesian Entropy Estimation for Countable Discrete Distributions

    Full text link
    We consider the problem of estimating Shannon's entropy HH from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over HH can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over HH, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over HH. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.Comment: 38 pages LaTeX. Revised and resubmitted to JML
    • …
    corecore