Search CORE

1,731 research outputs found

Bayesian Entropy Estimation for Countable Discrete Distributions

Author: Archer Evan
Park Il Memming
Pillow Jonathan
Publication venue
Publication date: 09/04/2014
Field of study

We consider the problem of estimating Shannon's entropy

H

from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over

H

can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over

H

, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over

H

. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.Comment: 38 pages LaTeX. Revised and resubmitted to JML

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

The Role of Beliefs in Inference for Rational Expectations Models

Author: Bruce N. Lehmann
Publication venue
Publication date
Field of study

This paper discusses inference for rational expectations models estimated via minimum distance methods by characterizing the probability beliefs regarding the data generating process (DGP) that are compatible with given moment conditions. The null hypothesis is taken to be rational expectations and the alternative hypothesis to be distorted beliefs. This distorted beliefs alternative is analyzed from the perspective of a hypothetical semiparametric Bayesian who believes the model and uses it to learn about the DGP. This interpretation provides a different perspective on estimates, test statistics, and confidence regions in large samples, particularly regarding the economic significance of rejections in rational expectations models.

Research Papers in Economics

On choosing and bounding probability metrics

Author: Aldous
Barron
Bernardo
Borovkov
Cam
Chung
Cover
Csiszar
Diaconis
Diaconis
Diaconis
Dudley
Dudley
Hartigan
Huber
Ibragimov
Jacod
Kakutani
Kolmogorov
Kuipers
Kullback
Kullback
LeCam
LeCam
Lehmann
Liese
Lindsay
Lindvall
Linnik
Lukacs
Lévy
Mathai
Nummelin
Orey
Petrov
Prokhorov
Rachev
Reiss
Rosenthal
Shannon
Strassen
Su
Su
Szulga
Tierney
Tierney
Williams
Zolotarev
Publication venue
Publication date: 01/01/2002
Field of study

When studying convergence of measures, an important issue is the choice of probability metric. In this review, we provide a summary and some new results concerning bounds among ten important probability metrics/distances that are used by statisticians and probabilists. We focus on these metrics because they are either well-known, commonly used, or admit practical bounding techniques. We summarize these relationships in a handy reference diagram, and also give examples to show how rates of convergence can depend on the metric chosen.Comment: To appear, International Statistical Review. Related work at http://www.math.hmc.edu/~su/papers.htm

arXiv.org e-Print Archive

CiteSeerX

Scholarship@Claremont

Crossref

Contributions to the understanding of Bayesian consistency.

Author: Antonio Lijoi
Igor Prünster
Stephen G. Walker
Publication venue
Publication date
Field of study

Consistency of Bayesian nonparametric procedures has been the focus of a considerable amount of research. Here we deal with strong consistency for Bayesian density estimation. An awkward consequence of inconsistency is pointed out. We investigate reasons for inconsistency and precisely identify the notion of “data tracking”. Specific examples in which this phenomenon can not occur are discussed. When it can happen, we show how and where things can go wrong, in particular the type of sets where the posterior can put mass.Bayesian consistency; Density estimation; Hellinger distance; Weak neighborhood

Research Papers in Economics

Asymptotics of Discrete MDL for Online Prediction

Author: Hutter Marcus
Poland Jan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e. observations come in one by one, and the predictor is allowed to update his state of mind after each time step. We identify two ways of predicting by MDL for this setup, namely a static} and a dynamic one. (A third variant, hybrid MDL, will turn out inferior.) We will prove that under the only assumption that the data is generated by a distribution contained in the model class, the MDL predictions converge to the true values almost surely. This is accomplished by proving finite bounds on the quadratic, the Hellinger, and the Kullback-Leibler loss of the MDL learner, which are however exponentially worse than for Bayesian prediction. We demonstrate that these bounds are sharp, even for model classes containing only Bernoulli distributions. We show how these bounds imply regret bounds for arbitrary loss functions. Our results apply to a wide range of setups, namely sequence prediction, pattern classification, regression, and universal induction in the sense of Algorithmic Information Theory among others.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Hokkaido University Collection of Scholarly and Academic Papers

Bayesian entropy estimators for spike trains

Author: Archer Evan
Park Il Memming
Pilow Jonathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Il Memming Park and Jonathan Pillow are with the Institute for Neuroscience and Department of Psychology, The University of Texas at Austin, TX 78712, USA -- Evan Archer is with the Institute for Computational and Engineering Sciences, The University of Texas at Austin, TX 78712, USA -- Jonathan Pillow is with the Division of Statistics and Scientific Computation, The University of Texas at Austin, Austin, TX 78712, USAPoster presentation: Information theoretic quantities have played a central role in neuroscience for quantifying neural codes [1]. Entropy and mutual information can be used to measure the maximum encoding capacity of a neuron, quantify the amount of noise, spatial and temporal functional dependence, learning process, and provide a fundamental limit for neural coding. Unfortunately, estimating entropy or mutual information is notoriously difficult--especially when the number of observations N is less than the number of possible symbols K [2]. For the neural spike trains, this is often the case due to the combinatorial nature of the symbols: for n simultaneously recorded neurons on m time bins, the number of possible symbols is K = 2n+m. Therefore, the question is how to extrapolate when you may have a severely under-sampled distribution. Here we describe a couple of recent advances in Bayesian entropy estimation for spike trains. Our approach follows that of Nemenman et al. [2], who formulated a Bayesian entropy estimator using a mixture-of-Dirichlet prior over the space of discrete distributions on K bins. We extend this approach to formulate two Bayesian estimators with different strategies to deal with severe under-sampling. For the first estimator, we design a novel mixture prior over countable distributions using the Pitman-Yor (PY) process [3]. The PY process is useful when the number of parameters is unknown a priori, and as a result finds many applications in Bayesian nonparametrics. PY process can model the heavy, power-law distributed tails which often occur in neural data. To reduce the bias of the estimator we analytically derive a set of mixing weights so that the resulting improper prior over entropy is approximately flat. We consider the posterior over entropy given a dataset (which contains some observed number of words but an unknown number of unobserved words), and show that the posterior mean can be efficiently computed via a simple numerical integral. The second estimator incorporates the prior knowledge about the spike trains. We use a simple Bernoulli process as a parametric model of the spike trains, and use a Dirichlet process to allow arbitrary deviation from the Bernoulli process. Under this model, very sparse spike trains are a priori orders of magnitude more likely than those with many spikes. Both estimators are computationally efficient, and statistically consistent. We applied those estimators to spike trains from early visual system to quantify neural coding [email protected]

Springer - Publisher Connector

PubMed Central

Texas ScholarWorks

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX