Search CORE

146 research outputs found

Bias Correction with Jackknife, Bootstrap, and Taylor Series

Author: Han Yanjun
Jiao Jiantao
Publication venue
Publication date: 15/06/2020
Field of study

We analyze bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating

f(p)

, where

f \in C[0,1]

is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in delete-

d

jackknife, different values of

d

may lead to drastically different behaviors in jackknife. We show that in the binomial model, iterating the bootstrap bias correction infinitely many times may lead to divergence of bias and variance, and demonstrate that the bias properties of the bootstrap bias corrected estimator after

r-1

rounds are of the same order as that of the

r

-jackknife estimator if a bounded coefficients condition is satisfied.Comment: to appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Minimax Estimation of the $L_1$ Distance

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/06/2018
Field of study

We consider the problem of estimating the

L_1

distance between two discrete probability measures

P

and

Q

from empirical data in a nonasymptotic and large alphabet setting. When

Q

is known and one obtains

n

samples from

P

, we show that for every

Q

, the minimax rate-optimal estimator with

n

samples achieves performance comparable to that of the maximum likelihood estimator (MLE) with

n\ln n

samples. When both

P

and

Q

are unknown, we construct minimax rate-optimal estimators whose worst case performance is essentially that of the known

Q

case with

Q

being uniform, implying that

Q

being uniform is essentially the most difficult case. The \emph{effective sample size enlargement} phenomenon, identified in Jiao \emph{et al.} (2015), holds both in the known

Q

case for every

Q

and the

Q

unknown case. However, the construction of optimal estimators for

\|P-Q\|_1

requires new techniques and insights beyond the approximation-based method of functional estimation in Jiao \emph{et al.} (2015).Comment: to appear on IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue
Publication date: 26/06/2018
Field of study

We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. We construct an efficiently computable estimator that achieves the minimax rates in estimating the distribution up to permutation, and show that the plug-in approach of our unlabeled distribution estimator is "universal" in estimating symmetric functionals of discrete distributions. Instead of doing best polynomial approximation explicitly as in existing literature of functional estimation, the plug-in approach conducts polynomial approximation implicitly and attains the optimal sample complexity for the entropy, power sum and support size functionals

arXiv.org e-Print Archive

On Estimation of $L_{r}$ -Norms in Gaussian White Noise Models

Author: Han Yanjun
Jiao Jiantao
Mukherjee Rajarshi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/07/2020
Field of study

We provide a complete picture of asymptotically minimax estimation of

L_r

-norms (for any

r\ge 1

) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of

r=1

(with poly-logarithmic gap between upper and lower bounds) and

r

even (with asymptotically sharp upper and lower bounds) over H\"{o}lder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even

r

in terms of an investigator's ability to produce asymptotically adaptive minimax estimators without paying a penalty.Comment: To appear in Probability Theory and Related Field

arXiv.org e-Print Archive

Minimax Estimation of Discrete Distributions under $\ell_1$ Loss

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue
Publication date: 28/12/2015
Field of study

We analyze the problem of discrete distribution estimation under

\ell_1

loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size

S

may grow with the number of observations

n

. We show that among distributions with bounded entropy

H

, the asymptotic maximum risk for the empirical distribution is

2H/\ln n

, while the asymptotic minimax risk is

H/\ln n

. Moreover, Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound

H

, is asymptotically minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again

2H/\ln n

. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (

\ell_1

divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation

arXiv.org e-Print Archive

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

Author: Gao Weihao
Han Yanjun
Jiao Jiantao
Publication venue
Publication date: 12/09/2018
Field of study

We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance over H\"older balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a new minimax lower bound over the H\"older ball, we show that the KL estimator is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter

s

of the H\"older ball for

s\in (0,2]

and arbitrary dimension

d

, rendering it the first estimator that provably satisfies this property

arXiv.org e-Print Archive

Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue
Publication date: 25/05/2020
Field of study

We study the minimax estimation of

\alpha

-divergences between discrete distributions for integer

\alpha\ge 1

, which include the Kullback--Leibler divergence and the

\chi^2

-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.Comment: This version has been significantly revise

arXiv.org e-Print Archive

Does Dirichlet Prior Smoothing Solve the Shannon Entropy Estimation Problem?

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/09/2017
Field of study

The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do \emph{not} improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman, and Wu and Yang. The performance of the minimax rate-optimal estimator with

n

samples is essentially \emph{at least} as good as that of the Dirichlet smoothed entropy estimators with

n\ln n

samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly developed and exploited by Jiao, Venkat, Han, and Weissman. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.Comment: 27 pages, 1 figure, published on IEEE Transactions on Information Theory, merged with https://arxiv.org/abs/1406.695

arXiv.org e-Print Archive

Adaptive Estimation of Shannon Entropy

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue
Publication date: 01/01/2019
Field of study

We consider estimating the Shannon entropy of a discrete distribution

P

from

n

i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman, and Wu and Yang constructed approximation theoretic estimators that achieve the minimax

L_2

rates in estimating entropy. Their estimators are consistent given

n \gg \frac{S}{\ln S}

samples, where

S

is the alphabet size, and it is the best possible sample complexity. In contrast, the Maximum Likelihood Estimator (MLE), which is the empirical entropy, requires

n\gg S

samples. In the present paper we significantly refine the minimax results of existing work. To alleviate the pessimism of minimaxity, we adopt the adaptive estimation framework, and show that the minimax rate-optimal estimator in Jiao, Venkat, Han, and Weissman achieves the minimax rates simultaneously over a nested sequence of subsets of distributions

P

, without knowing the alphabet size

S

or which subset

P

lies in. In other words, their estimator is adaptive with respect to this nested sequence of the parameter space, which is characterized by the entropy of the distribution. We also characterize the maximum risk of the MLE over this nested sequence, and show, for every subset in the sequence, that the performance of the minimax rate-optimal estimator with

n

samples is essentially that of the MLE with

n\ln n

samples, thereby further substantiating the generality of the phenomenon identified by Jiao, Venkat, Han, and Weissman

arXiv.org e-Print Archive

Generalizations of Maximal Inequalities to Arbitrary Selection Rules

Author: Han Yanjun
Jiao Jiantao
Weissman Tsachy
Publication venue
Publication date: 29/08/2017
Field of study

We present a generalization of the maximal inequalities that upper bound the expectation of the maximum of

n

jointly distributed random variables. We control the expectation of a randomly selected random variable from

n

jointly distributed random variables, and present bounds that are at least as tight as the classical maximal inequalities, and much tighter when the distribution of selection index is near deterministic. A new family of information theoretic measures were introduced in the process, which may be of independent interest

arXiv.org e-Print Archive