146 research outputs found
Bias Correction with Jackknife, Bootstrap, and Taylor Series
We analyze bias correction methods using jackknife, bootstrap, and Taylor
series. We focus on the binomial model, and consider the problem of bias
correction for estimating , where is arbitrary. We
characterize the supremum norm of the bias of general jackknife and bootstrap
estimators for any continuous functions, and demonstrate the in delete-
jackknife, different values of may lead to drastically different behaviors
in jackknife. We show that in the binomial model, iterating the bootstrap bias
correction infinitely many times may lead to divergence of bias and variance,
and demonstrate that the bias properties of the bootstrap bias corrected
estimator after rounds are of the same order as that of the -jackknife
estimator if a bounded coefficients condition is satisfied.Comment: to appear in IEEE Transactions on Information Theor
Minimax Estimation of the Distance
We consider the problem of estimating the distance between two discrete
probability measures and from empirical data in a nonasymptotic and
large alphabet setting. When is known and one obtains samples from ,
we show that for every , the minimax rate-optimal estimator with samples
achieves performance comparable to that of the maximum likelihood estimator
(MLE) with samples. When both and are unknown, we construct
minimax rate-optimal estimators whose worst case performance is essentially
that of the known case with being uniform, implying that being
uniform is essentially the most difficult case. The \emph{effective sample size
enlargement} phenomenon, identified in Jiao \emph{et al.} (2015), holds both in
the known case for every and the unknown case. However, the
construction of optimal estimators for requires new techniques and
insights beyond the approximation-based method of functional estimation in Jiao
\emph{et al.} (2015).Comment: to appear on IEEE Transactions on Information Theor
Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance
We present \emph{Local Moment Matching (LMM)}, a unified methodology for
symmetric functional estimation and distribution estimation under Wasserstein
distance. We construct an efficiently computable estimator that achieves the
minimax rates in estimating the distribution up to permutation, and show that
the plug-in approach of our unlabeled distribution estimator is "universal" in
estimating symmetric functionals of discrete distributions. Instead of doing
best polynomial approximation explicitly as in existing literature of
functional estimation, the plug-in approach conducts polynomial approximation
implicitly and attains the optimal sample complexity for the entropy, power sum
and support size functionals
On Estimation of -Norms in Gaussian White Noise Models
We provide a complete picture of asymptotically minimax estimation of
-norms (for any ) of the mean in Gaussian white noise model over
Nikolskii-Besov spaces. In this regard, we complement the work of Lepski,
Nemirovski and Spokoiny (1999), who considered the cases of (with
poly-logarithmic gap between upper and lower bounds) and even (with
asymptotically sharp upper and lower bounds) over H\"{o}lder spaces. We
additionally consider the case of asymptotically adaptive minimax estimation
and demonstrate a difference between even and non-even in terms of an
investigator's ability to produce asymptotically adaptive minimax estimators
without paying a penalty.Comment: To appear in Probability Theory and Related Field
Minimax Estimation of Discrete Distributions under Loss
We analyze the problem of discrete distribution estimation under
loss. We provide non-asymptotic upper and lower bounds on the maximum risk of
the empirical distribution (the maximum likelihood estimator), and the minimax
risk in regimes where the alphabet size may grow with the number of
observations . We show that among distributions with bounded entropy ,
the asymptotic maximum risk for the empirical distribution is , while
the asymptotic minimax risk is . Moreover, Moreover, we show that a
hard-thresholding estimator oblivious to the unknown upper bound , is
asymptotically minimax. However, if we constrain the estimates to lie in the
simplex of probability distributions, then the asymptotic minimax risk is again
. We draw connections between our work and the literature on density
estimation, entropy estimation, total variation distance ( divergence)
estimation, joint distribution estimation in stochastic processes, normal mean
estimation, and adaptive estimation
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal
We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the
differential entropy. We obtain the first uniform upper bound on its
performance over H\"older balls on a torus without assuming any conditions on
how close the density could be from zero. Accompanying a new minimax lower
bound over the H\"older ball, we show that the KL estimator is achieving the
minimax rates up to logarithmic factors without cognizance of the smoothness
parameter of the H\"older ball for and arbitrary dimension
, rendering it the first estimator that provably satisfies this property
Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions
We study the minimax estimation of -divergences between discrete
distributions for integer , which include the Kullback--Leibler
divergence and the -divergences as special examples. Dropping the usual
theoretical tricks to acquire independence, we construct the first minimax
rate-optimal estimator which does not require any Poissonization, sample
splitting, or explicit construction of approximating polynomials. The estimator
uses a hybrid approach which solves a problem-independent linear program based
on moment matching in the non-smooth regime, and applies a problem-dependent
bias-corrected plug-in estimator in the smooth regime, with a soft decision
boundary between these regimes.Comment: This version has been significantly revise
Does Dirichlet Prior Smoothing Solve the Shannon Entropy Estimation Problem?
The Dirichlet prior is widely used in estimating discrete distributions and
functionals of discrete distributions. In terms of Shannon entropy estimation,
one approach is to plug-in the Dirichlet prior smoothed distribution into the
entropy functional, while the other one is to calculate the Bayes estimator for
entropy under the Dirichlet prior for squared error, which is the conditional
expectation. We show that in general they do \emph{not} improve over the
maximum likelihood estimator, which plugs-in the empirical distribution into
the entropy functional. No matter how we tune the parameters in the Dirichlet
prior, this approach cannot achieve the minimax rates in entropy estimation, as
recently characterized by Jiao, Venkat, Han, and Weissman, and Wu and Yang. The
performance of the minimax rate-optimal estimator with samples is
essentially \emph{at least} as good as that of the Dirichlet smoothed entropy
estimators with samples.
We harness the theory of approximation using positive linear operators for
analyzing the bias of plug-in estimators for general functionals under
arbitrary statistical models, thereby further consolidating the interplay
between these two fields, which was thoroughly developed and exploited by Jiao,
Venkat, Han, and Weissman. We establish new results in approximation theory,
and apply them to analyze the bias of the Dirichlet prior smoothed plug-in
entropy estimator. This interplay between bias analysis and approximation
theory is of relevance and consequence far beyond the specific problem setting
in this paper.Comment: 27 pages, 1 figure, published on IEEE Transactions on Information
Theory, merged with https://arxiv.org/abs/1406.695
Adaptive Estimation of Shannon Entropy
We consider estimating the Shannon entropy of a discrete distribution
from i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman, and Wu and
Yang constructed approximation theoretic estimators that achieve the minimax
rates in estimating entropy. Their estimators are consistent given samples, where is the alphabet size, and it is the best
possible sample complexity. In contrast, the Maximum Likelihood Estimator
(MLE), which is the empirical entropy, requires samples.
In the present paper we significantly refine the minimax results of existing
work. To alleviate the pessimism of minimaxity, we adopt the adaptive
estimation framework, and show that the minimax rate-optimal estimator in Jiao,
Venkat, Han, and Weissman achieves the minimax rates simultaneously over a
nested sequence of subsets of distributions , without knowing the alphabet
size or which subset lies in. In other words, their estimator is
adaptive with respect to this nested sequence of the parameter space, which is
characterized by the entropy of the distribution. We also characterize the
maximum risk of the MLE over this nested sequence, and show, for every subset
in the sequence, that the performance of the minimax rate-optimal estimator
with samples is essentially that of the MLE with samples, thereby
further substantiating the generality of the phenomenon identified by Jiao,
Venkat, Han, and Weissman
Generalizations of Maximal Inequalities to Arbitrary Selection Rules
We present a generalization of the maximal inequalities that upper bound the
expectation of the maximum of jointly distributed random variables. We
control the expectation of a randomly selected random variable from jointly
distributed random variables, and present bounds that are at least as tight as
the classical maximal inequalities, and much tighter when the distribution of
selection index is near deterministic. A new family of information theoretic
measures were introduced in the process, which may be of independent interest
- β¦