1,731 research outputs found
Bayesian Entropy Estimation for Countable Discrete Distributions
We consider the problem of estimating Shannon's entropy from discrete
data, in cases where the number of possible symbols is unknown or even
countably infinite. The Pitman-Yor process, a generalization of Dirichlet
process, provides a tractable prior distribution over the space of countably
infinite discrete distributions, and has found major applications in Bayesian
non-parametric statistics and machine learning. Here we show that it also
provides a natural family of priors for Bayesian entropy estimation, due to the
fact that moments of the induced posterior distribution over can be
computed analytically. We derive formulas for the posterior mean (Bayes' least
squares estimate) and variance under Dirichlet and Pitman-Yor process priors.
Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a
narrow prior distribution over , meaning the prior strongly determines the
entropy estimate in the under-sampled regime. We derive a family of continuous
mixing measures such that the resulting mixture of Pitman-Yor processes
produces an approximately flat prior over . We show that the resulting
Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of
distributions. We explore the theoretical properties of the resulting
estimator, and show that it performs well both in simulation and in application
to real data.Comment: 38 pages LaTeX. Revised and resubmitted to JML
The Role of Beliefs in Inference for Rational Expectations Models
This paper discusses inference for rational expectations models estimated via minimum distance methods by characterizing the probability beliefs regarding the data generating process (DGP) that are compatible with given moment conditions. The null hypothesis is taken to be rational expectations and the alternative hypothesis to be distorted beliefs. This distorted beliefs alternative is analyzed from the perspective of a hypothetical semiparametric Bayesian who believes the model and uses it to learn about the DGP. This interpretation provides a different perspective on estimates, test statistics, and confidence regions in large samples, particularly regarding the economic significance of rejections in rational expectations models.
On choosing and bounding probability metrics
When studying convergence of measures, an important issue is the choice of
probability metric. In this review, we provide a summary and some new results
concerning bounds among ten important probability metrics/distances that are
used by statisticians and probabilists. We focus on these metrics because they
are either well-known, commonly used, or admit practical bounding techniques.
We summarize these relationships in a handy reference diagram, and also give
examples to show how rates of convergence can depend on the metric chosen.Comment: To appear, International Statistical Review. Related work at
http://www.math.hmc.edu/~su/papers.htm
Contributions to the understanding of Bayesian consistency.
Consistency of Bayesian nonparametric procedures has been the focus of a considerable amount of research. Here we deal with strong consistency for Bayesian density estimation. An awkward consequence of inconsistency is pointed out. We investigate reasons for inconsistency and precisely identify the notion of âdata trackingâ. Specific examples in which this phenomenon can not occur are discussed. When it can happen, we show how and where things can go wrong, in particular the type of sets where the posterior can put mass.Bayesian consistency; Density estimation; Hellinger distance; Weak neighborhood
Asymptotics of Discrete MDL for Online Prediction
Minimum Description Length (MDL) is an important principle for induction and
prediction, with strong relations to optimal Bayesian learning. This paper
deals with learning non-i.i.d. processes by means of two-part MDL, where the
underlying model class is countable. We consider the online learning framework,
i.e. observations come in one by one, and the predictor is allowed to update
his state of mind after each time step. We identify two ways of predicting by
MDL for this setup, namely a static} and a dynamic one. (A third variant,
hybrid MDL, will turn out inferior.) We will prove that under the only
assumption that the data is generated by a distribution contained in the model
class, the MDL predictions converge to the true values almost surely. This is
accomplished by proving finite bounds on the quadratic, the Hellinger, and the
Kullback-Leibler loss of the MDL learner, which are however exponentially worse
than for Bayesian prediction. We demonstrate that these bounds are sharp, even
for model classes containing only Bernoulli distributions. We show how these
bounds imply regret bounds for arbitrary loss functions. Our results apply to a
wide range of setups, namely sequence prediction, pattern classification,
regression, and universal induction in the sense of Algorithmic Information
Theory among others.Comment: 34 page
Bayesian entropy estimators for spike trains
Il Memming Park and Jonathan Pillow are with the Institute for Neuroscience and Department of Psychology, The University of Texas at Austin, TX 78712, USA -- Evan Archer is with the Institute for Computational and Engineering Sciences, The University of Texas at Austin, TX 78712, USA -- Jonathan Pillow is with the Division of Statistics and Scientific Computation, The University of Texas at Austin, Austin, TX 78712, USAPoster presentation:
Information theoretic quantities have played a central role in neuroscience for quantifying neural codes [1]. Entropy and mutual information can be used to measure the maximum encoding capacity of a neuron, quantify the amount of noise, spatial and temporal functional dependence, learning process, and provide a fundamental limit for neural coding. Unfortunately, estimating entropy or mutual information is notoriously difficult--especially when the number of observations N is less than the number of possible symbols K [2]. For the neural spike trains, this is often the case due to the combinatorial nature of the symbols: for n simultaneously recorded neurons on m time bins, the number of possible symbols is K = 2n+m. Therefore, the question is how to extrapolate when you may have a severely under-sampled distribution.
Here we describe a couple of recent advances in Bayesian entropy estimation for spike trains. Our approach follows that of Nemenman et al. [2], who formulated a Bayesian entropy estimator using a mixture-of-Dirichlet prior over the space of discrete distributions on K bins. We extend this approach to formulate two Bayesian estimators with different strategies to deal with severe under-sampling.
For the first estimator, we design a novel mixture prior over countable distributions using the Pitman-Yor (PY) process [3]. The PY process is useful when the number of parameters is unknown a priori, and as a result finds many applications in Bayesian nonparametrics. PY process can model the heavy, power-law distributed tails which often occur in neural data. To reduce the bias of the estimator we analytically derive a set of mixing weights so that the resulting improper prior over entropy is approximately flat. We consider the posterior over entropy given a dataset (which contains some observed number of words but an unknown number of unobserved words), and show that the posterior mean can be efficiently computed via a simple numerical integral.
The second estimator incorporates the prior knowledge about the spike trains. We use a simple Bernoulli process as a parametric model of the spike trains, and use a Dirichlet process to allow arbitrary deviation from the Bernoulli process. Under this model, very sparse spike trains are a priori orders of magnitude more likely than those with many spikes. Both estimators are computationally efficient, and statistically consistent. We applied those estimators to spike trains from early visual system to quantify neural coding [email protected]
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in particular,
will be studied. The probability of observing at time , given past
observations can be computed with the chain rule if the true
generating distribution of the sequences is known. If
is unknown, but known to belong to a countable or continuous class \M
one can base ones prediction on the Bayes-mixture defined as a
-weighted sum or integral of distributions \nu\in\M. The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that no
other predictor can lead to significantly smaller bounds. Furthermore, for
various performance measures, we show Pareto-optimality of and give an
Occam's razor argument that the choice for the weights
is optimal, where is the length of the shortest program describing
. The results are applied to games of chance, defined as a sequence of
bets, observations, and rewards. The prediction schemes (and bounds) are
compared to the popular predictors based on expert advice. Extensions to
infinite alphabets, partial, delayed and probabilistic prediction,
classification, and more active systems are briefly discussed.Comment: 34 page
- âŚ