Search CORE

4 research outputs found

Universal Covertness for Discrete Memoryless Sources

Author: Bloch Matthieu R.
Chou Remi A.
Yener Aylin
Publication venue
Publication date: 16/08/2018
Field of study

Consider a sequence

X^n

of length

n

emitted by a Discrete Memoryless Source (DMS) with unknown distribution

p_X

. The objective is to construct a lossless source code that maps

X^n

to a sequence

\widehat{Y}^m

of length

m

that is indistinguishable, in terms of Kullback-Leibler divergence, from a sequence emitted by another DMS with known distribution

p_Y

. The main result is the existence of a coding scheme that performs this task with an optimal ratio

m/n

equal to

H(X)/H(Y)

, the ratio of the Shannon entropies of the two distributions, as

n

goes to infinity. The coding scheme overcomes the challenges created by the lack of knowledge about

p_X

by relying on a sufficiently fine estimation of

H(X)

, followed by an appropriately designed type-based source coding that jointly performs source resolvability and universal lossless source coding. The result recovers and extends previous results that either assume

p_X

p_Y

uniform, or

p_X

known. The price paid for these generalizations is the use of common randomness with vanishing rate, whose length roughly scales as the square root of

n

. By allowing common randomness strictly larger than the square root of

n

but still negligible compared to

n

, a constructive low-complexity encoding and decoding counterpart to the main result is also provided for binary sources by means of polar codes.Comment: 36 pages, 2 figure

arXiv.org e-Print Archive

Estimation of KL Divergence: Optimal Minimax Rate

Author: Bu Yuheng
Liang Yingbin
Veeravalli Venugopal V.
Zou Shaofeng
Publication venue
Publication date: 20/02/2018
Field of study

The problem of estimating the Kullback-Leibler divergence

D(P\|Q)

between two unknown distributions

P

and

Q

is studied, under the assumption that the alphabet size

k

of the distributions can scale to infinity. The estimation is based on

m

independent samples drawn from

P

and

n

independent samples drawn from

Q

. It is first shown that there does not exist any consistent estimator that guarantees asymptotically small worst-case quadratic risk over the set of all pairs of distributions. A restricted set that contains pairs of distributions, with density ratio bounded by a function

f(k)

is further considered. {An augmented plug-in estimator is proposed, and its worst-case quadratic risk is shown to be within a constant factor of

(\frac{k}{m}+\frac{kf(k)}{n})^2+\frac{\log ^2 f(k)}{m}+\frac{f(k)}{n}

, if

m

and

n

exceed a constant factor of

k

and

kf(k)

, respectively.} Moreover, the minimax quadratic risk is characterized to be within a constant factor of

(\frac{k}{m\log k}+\frac{kf(k)}{n\log k})^2+\frac{\log ^2 f(k)}{m}+\frac{f(k)}{n}

, if

m

and

n

exceed a constant factor of

k/\log(k)

and

kf(k)/\log k

, respectively. The lower bound on the minimax quadratic risk is characterized by employing a generalized Le Cam's method. A minimax optimal estimator is then constructed by employing both the polynomial approximation and the plug-in approaches.Comment: IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Instance Based Approximations to Profile Maximum Likelihood

Author: Anari Nima
Charikar Moses
Shiragur Kirankumar
Sidford Aaron
Publication venue
Publication date: 05/11/2020
Field of study

In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure. Further, we provide a simpler and more practical PseudoPML implementation that matches the best-known theoretical guarantees of such an estimator and evaluate this method empirically.Comment: Accepted at Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020

arXiv.org e-Print Archive

Efficient Profile Maximum Likelihood for Universal Symmetric Property Estimation

Author: Charikar Moses
Shiragur Kirankumar
Sidford Aaron
Publication venue
Publication date: 21/05/2019
Field of study

Estimating symmetric properties of a distribution, e.g. support size, coverage, entropy, distance to uniformity, are among the most fundamental problems in algorithmic statistics. While each of these properties have been studied extensively and separate optimal estimators are known for each, in striking recent work, Acharya et al. 2016 showed that there is a single estimator that is competitive for all symmetric properties. This work proved that computing the distribution that approximately maximizes \emph{profile likelihood (PML)}, i.e. the probability of observed frequency of frequencies, and returning the value of the property on this distribution is sample competitive with respect to a broad class of estimators of symmetric properties. Further, they showed that even computing an approximation of the PML suffices to achieve such a universal plug-in estimator. Unfortunately, prior to this work there was no known polynomial time algorithm to compute an approximate PML and it was open to obtain a polynomial time universal plug-in estimator through the use of approximate PML. In this paper we provide a algorithm (in number of samples) that, given

n

samples from a distribution, computes an approximate PML distribution up to a multiplicative error of

\exp(n^{2/3} \mathrm{poly} \log(n))

in time nearly linear in

n

. Generalizing work of Acharya et al. 2016 on the utility of approximate PML we show that our algorithm provides a nearly linear time universal plug-in estimator for all symmetric functions up to accuracy

\epsilon = \Omega(n^{-0.166})

. Further, we show how to extend our work to provide efficient polynomial-time algorithms for computing a

d

-dimensional generalization of PML (for constant

d

) that allows for universal plug-in estimation of symmetric relationships between distributions.Comment: 68 page

arXiv.org e-Print Archive