1,731 research outputs found

    Bayesian Entropy Estimation for Countable Discrete Distributions

    Full text link
    We consider the problem of estimating Shannon's entropy HH from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over HH can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over HH, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over HH. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.Comment: 38 pages LaTeX. Revised and resubmitted to JML

    The Role of Beliefs in Inference for Rational Expectations Models

    Get PDF
    This paper discusses inference for rational expectations models estimated via minimum distance methods by characterizing the probability beliefs regarding the data generating process (DGP) that are compatible with given moment conditions. The null hypothesis is taken to be rational expectations and the alternative hypothesis to be distorted beliefs. This distorted beliefs alternative is analyzed from the perspective of a hypothetical semiparametric Bayesian who believes the model and uses it to learn about the DGP. This interpretation provides a different perspective on estimates, test statistics, and confidence regions in large samples, particularly regarding the economic significance of rejections in rational expectations models.

    On choosing and bounding probability metrics

    Get PDF
    When studying convergence of measures, an important issue is the choice of probability metric. In this review, we provide a summary and some new results concerning bounds among ten important probability metrics/distances that are used by statisticians and probabilists. We focus on these metrics because they are either well-known, commonly used, or admit practical bounding techniques. We summarize these relationships in a handy reference diagram, and also give examples to show how rates of convergence can depend on the metric chosen.Comment: To appear, International Statistical Review. Related work at http://www.math.hmc.edu/~su/papers.htm

    Contributions to the understanding of Bayesian consistency.

    Get PDF
    Consistency of Bayesian nonparametric procedures has been the focus of a considerable amount of research. Here we deal with strong consistency for Bayesian density estimation. An awkward consequence of inconsistency is pointed out. We investigate reasons for inconsistency and precisely identify the notion of “data tracking”. Specific examples in which this phenomenon can not occur are discussed. When it can happen, we show how and where things can go wrong, in particular the type of sets where the posterior can put mass.Bayesian consistency; Density estimation; Hellinger distance; Weak neighborhood

    Asymptotics of Discrete MDL for Online Prediction

    Get PDF
    Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying model class is countable. We consider the online learning framework, i.e. observations come in one by one, and the predictor is allowed to update his state of mind after each time step. We identify two ways of predicting by MDL for this setup, namely a static} and a dynamic one. (A third variant, hybrid MDL, will turn out inferior.) We will prove that under the only assumption that the data is generated by a distribution contained in the model class, the MDL predictions converge to the true values almost surely. This is accomplished by proving finite bounds on the quadratic, the Hellinger, and the Kullback-Leibler loss of the MDL learner, which are however exponentially worse than for Bayesian prediction. We demonstrate that these bounds are sharp, even for model classes containing only Bernoulli distributions. We show how these bounds imply regret bounds for arbitrary loss functions. Our results apply to a wide range of setups, namely sequence prediction, pattern classification, regression, and universal induction in the sense of Algorithmic Information Theory among others.Comment: 34 page

    Bayesian entropy estimators for spike trains

    Get PDF
    Il Memming Park and Jonathan Pillow are with the Institute for Neuroscience and Department of Psychology, The University of Texas at Austin, TX 78712, USA -- Evan Archer is with the Institute for Computational and Engineering Sciences, The University of Texas at Austin, TX 78712, USA -- Jonathan Pillow is with the Division of Statistics and Scientific Computation, The University of Texas at Austin, Austin, TX 78712, USAPoster presentation: Information theoretic quantities have played a central role in neuroscience for quantifying neural codes [1]. Entropy and mutual information can be used to measure the maximum encoding capacity of a neuron, quantify the amount of noise, spatial and temporal functional dependence, learning process, and provide a fundamental limit for neural coding. Unfortunately, estimating entropy or mutual information is notoriously difficult--especially when the number of observations N is less than the number of possible symbols K [2]. For the neural spike trains, this is often the case due to the combinatorial nature of the symbols: for n simultaneously recorded neurons on m time bins, the number of possible symbols is K = 2n+m. Therefore, the question is how to extrapolate when you may have a severely under-sampled distribution. Here we describe a couple of recent advances in Bayesian entropy estimation for spike trains. Our approach follows that of Nemenman et al. [2], who formulated a Bayesian entropy estimator using a mixture-of-Dirichlet prior over the space of discrete distributions on K bins. We extend this approach to formulate two Bayesian estimators with different strategies to deal with severe under-sampling. For the first estimator, we design a novel mixture prior over countable distributions using the Pitman-Yor (PY) process [3]. The PY process is useful when the number of parameters is unknown a priori, and as a result finds many applications in Bayesian nonparametrics. PY process can model the heavy, power-law distributed tails which often occur in neural data. To reduce the bias of the estimator we analytically derive a set of mixing weights so that the resulting improper prior over entropy is approximately flat. We consider the posterior over entropy given a dataset (which contains some observed number of words but an unknown number of unobserved words), and show that the posterior mean can be efficiently computed via a simple numerical integral. The second estimator incorporates the prior knowledge about the spike trains. We use a simple Bernoulli process as a parametric model of the spike trains, and use a Dirichlet process to allow arbitrary deviation from the Bernoulli process. Under this model, very sparse spike trains are a priori orders of magnitude more likely than those with many spikes. Both estimators are computationally efficient, and statistically consistent. We applied those estimators to spike trains from early visual system to quantify neural coding [email protected]

    Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

    Full text link
    Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing xtx_t at time tt, given past observations x1...xt−1x_1...x_{t-1} can be computed with the chain rule if the true generating distribution μ\mu of the sequences x1x2x3...x_1x_2x_3... is known. If μ\mu is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture ξ\xi defined as a wνw_\nu-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on ξ\xi is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on μ\mu. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of ξ\xi and give an Occam's razor argument that the choice wν∼2−K(ν)w_\nu\sim 2^{-K(\nu)} for the weights is optimal, where K(ν)K(\nu) is the length of the shortest program describing ν\nu. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page
    • …
    corecore