263 research outputs found
A Bernstein-Von Mises Theorem for discrete probability distributions
We investigate the asymptotic normality of the posterior distribution in the
discrete setting, when model dimension increases with sample size. We consider
a probability mass function on \mathbbm{N}\setminus \{0\} and a
sequence of truncation levels satisfying Let denote the maximum likelihood estimate of
and let denote the
-dimensional vector which -th coordinate is defined by \sqrt{n}
(\hat{\theta}_n(i)-\theta_0(i)) for We check that under mild
conditions on and on the sequence of prior probabilities on the
-dimensional simplices, after centering and rescaling, the variation
distance between the posterior distribution recentered around
and rescaled by and the -dimensional Gaussian distribution
converges in probability to
This theorem can be used to prove the asymptotic normality of Bayesian
estimators of Shannon and R\'{e}nyi entropies. The proofs are based on
concentration inequalities for centered and non-centered Chi-square (Pearson)
statistics. The latter allow to establish posterior concentration rates with
respect to Fisher distance rather than with respect to the Hellinger distance
as it is commonplace in non-parametric Bayesian statistics.Comment: Published in at http://dx.doi.org/10.1214/08-EJS262 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Sparse adaptive Dirichlet-multinomial-like processes
Online estimation and modelling of i.i.d. data for short
sequences over large or complex ''alphabets'' is a ubiquitous
(sub)problem in machine learning, information theory, data
compression, statistical language processing, and document
analysis. The Dirichlet-Multinomial distribution (also called
Polya urn scheme) and extensions thereof are widely applied for
online i.i.d. estimation. Good a-priori choices for the
parameters in this regime are difficult to obtain though. I
derive an optimal adaptive choice for the main parameter via
tight, data-dependent redundancy bounds for a related model. The
1-line recommendation is to set the 'total mass' = 'precision' =
'concentration' parameter to m/2ln[(n+1)/m], where n
is the (past) sample size and m the number of different symbols
observed (so far). The resulting estimator is simple, online,
fast, and experimental performance is superb
Analysis of pivot sampling in dual-pivot Quicksort: A holistic analysis of Yaroslavskiy's partitioning scheme
The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0041-7The new dual-pivot Quicksort by Vladimir Yaroslavskiy-used in Oracle's Java runtime library since version 7-features intriguing asymmetries. They make a basic variant of this algorithm use less comparisons than classic single-pivot Quicksort. In this paper, we extend the analysis to the case where the two pivots are chosen as fixed order statistics of a random sample. Surprisingly, dual-pivot Quicksort then needs more comparisons than a corresponding version of classic Quicksort, so it is clear that counting comparisons is not sufficient to explain the running time advantages observed for Yaroslavskiy's algorithm in practice. Consequently, we take a more holistic approach and give also the precise leading term of the average number of swaps, the number of executed Java Bytecode instructions and the number of scanned elements, a new simple cost measure that approximates I/O costs in the memory hierarchy. We determine optimal order statistics for each of the cost measures. It turns out that the asymmetries in Yaroslavskiy's algorithm render pivots with a systematic skew more efficient than the symmetric choice. Moreover, we finally have a convincing explanation for the success of Yaroslavskiy's algorithm in practice: compared with corresponding versions of classic single-pivot Quicksort, dual-pivot Quicksort needs significantly less I/Os, both with and without pivot sampling.Peer ReviewedPostprint (author's final draft
- …