Search CORE

2 research outputs found

Phase Transitions for the Uniform Distribution in the PML Problem and its Bethe Approximation

Author: Chan Chun Lam
Fernandes Winston
Kashyap Navin
Krishnapur Manjunath
Publication venue
Publication date: 02/06/2015
Field of study

The pattern maximum likelihood (PML) estimate, introduced by Orlitsky et al., is an estimate of the multiset of probabilities in an unknown probability distribution

\mathbf{p}

, the estimate being obtained from

n

i.i.d. samples drawn from

\mathbf{p}

. The PML estimate involves solving a difficult optimization problem over the set of all probability mass functions (pmfs) of finite support. In this paper, we describe an interesting phase transition phenomenon in the PML estimate: at a certain sharp threshold, the uniform distribution goes from being a local maximum to being a local minimum for the optimization problem in the estimate. We go on to consider the question of whether a similar phase transition phenomenon also exists in the Bethe approximation of the PML estimate, the latter being an approximation method with origins in statistical physics. We show that the answer to this question is a qualified "Yes". Our analysis involves the computation of the mean and variance of the

(i,j)

th entry,

a_{i,j}

, in a random

k \times k

non-negative integer matrix

A

with row and column sums all equal to

M

, drawn according to a distribution that assigns to

A

a probability proportional to

\prod_{i,j} \frac{(M-a_{i,j})!}{a_{i,j}!}

arXiv.org e-Print Archive

Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Discrete Distributions

Author: Hao Yi
Orlitsky Alon
Publication venue
Publication date: 26/02/2020
Field of study

The profile of a sample is the multiset of its symbol frequencies. We show that for samples of discrete distributions, profile entropy is a fundamental measure unifying the concepts of estimation, inference, and compression. Specifically, profile entropy a) determines the speed of estimating the distribution relative to the best natural estimator; b) characterizes the rate of inferring all symmetric properties compared with the best estimator over any label-invariant distribution collection; c) serves as the limit of profile compression, for which we derive optimal near-linear-time block and sequential algorithms. To further our understanding of profile entropy, we investigate its attributes, provide algorithms for approximating its value, and determine its magnitude for numerous structural distribution families.Comment: 56 page

arXiv.org e-Print Archive