7,501 research outputs found

    Bias Correction with Jackknife, Bootstrap, and Taylor Series

    Full text link
    We analyze bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating f(p)f(p), where f∈C[0,1]f \in C[0,1] is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in delete-dd jackknife, different values of dd may lead to drastically different behaviors in jackknife. We show that in the binomial model, iterating the bootstrap bias correction infinitely many times may lead to divergence of bias and variance, and demonstrate that the bias properties of the bootstrap bias corrected estimator after rβˆ’1r-1 rounds are of the same order as that of the rr-jackknife estimator if a bounded coefficients condition is satisfied.Comment: to appear in IEEE Transactions on Information Theor

    Minimax Estimation of Discrete Distributions under β„“1\ell_1 Loss

    Full text link
    We analyze the problem of discrete distribution estimation under β„“1\ell_1 loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size SS may grow with the number of observations nn. We show that among distributions with bounded entropy HH, the asymptotic maximum risk for the empirical distribution is 2H/ln⁑n2H/\ln n, while the asymptotic minimax risk is H/ln⁑nH/\ln n. Moreover, Moreover, we show that a hard-thresholding estimator oblivious to the unknown upper bound HH, is asymptotically minimax. However, if we constrain the estimates to lie in the simplex of probability distributions, then the asymptotic minimax risk is again 2H/ln⁑n2H/\ln n. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (β„“1\ell_1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation

    Generalizations of Maximal Inequalities to Arbitrary Selection Rules

    Full text link
    We present a generalization of the maximal inequalities that upper bound the expectation of the maximum of nn jointly distributed random variables. We control the expectation of a randomly selected random variable from nn jointly distributed random variables, and present bounds that are at least as tight as the classical maximal inequalities, and much tighter when the distribution of selection index is near deterministic. A new family of information theoretic measures were introduced in the process, which may be of independent interest

    Minimax Estimation of the L1L_1 Distance

    Full text link
    We consider the problem of estimating the L1L_1 distance between two discrete probability measures PP and QQ from empirical data in a nonasymptotic and large alphabet setting. When QQ is known and one obtains nn samples from PP, we show that for every QQ, the minimax rate-optimal estimator with nn samples achieves performance comparable to that of the maximum likelihood estimator (MLE) with nln⁑nn\ln n samples. When both PP and QQ are unknown, we construct minimax rate-optimal estimators whose worst case performance is essentially that of the known QQ case with QQ being uniform, implying that QQ being uniform is essentially the most difficult case. The \emph{effective sample size enlargement} phenomenon, identified in Jiao \emph{et al.} (2015), holds both in the known QQ case for every QQ and the QQ unknown case. However, the construction of optimal estimators for βˆ₯Pβˆ’Qβˆ₯1\|P-Q\|_1 requires new techniques and insights beyond the approximation-based method of functional estimation in Jiao \emph{et al.} (2015).Comment: to appear on IEEE Transactions on Information Theor

    On Estimation of LrL_{r}-Norms in Gaussian White Noise Models

    Full text link
    We provide a complete picture of asymptotically minimax estimation of LrL_r-norms (for any rβ‰₯1r\ge 1) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of r=1r=1 (with poly-logarithmic gap between upper and lower bounds) and rr even (with asymptotically sharp upper and lower bounds) over H\"{o}lder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even rr in terms of an investigator's ability to produce asymptotically adaptive minimax estimators without paying a penalty.Comment: To appear in Probability Theory and Related Field

    The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

    Full text link
    We analyze the Kozachenko--Leonenko (KL) nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance over H\"older balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a new minimax lower bound over the H\"older ball, we show that the KL estimator is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter ss of the H\"older ball for s∈(0,2]s\in (0,2] and arbitrary dimension dd, rendering it the first estimator that provably satisfies this property

    Local moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance

    Full text link
    We present \emph{Local Moment Matching (LMM)}, a unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance. We construct an efficiently computable estimator that achieves the minimax rates in estimating the distribution up to permutation, and show that the plug-in approach of our unlabeled distribution estimator is "universal" in estimating symmetric functionals of discrete distributions. Instead of doing best polynomial approximation explicitly as in existing literature of functional estimation, the plug-in approach conducts polynomial approximation implicitly and attains the optimal sample complexity for the entropy, power sum and support size functionals

    Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

    Full text link
    We study the minimax estimation of Ξ±\alpha-divergences between discrete distributions for integer Ξ±β‰₯1\alpha\ge 1, which include the Kullback--Leibler divergence and the Ο‡2\chi^2-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials. The estimator uses a hybrid approach which solves a problem-independent linear program based on moment matching in the non-smooth regime, and applies a problem-dependent bias-corrected plug-in estimator in the smooth regime, with a soft decision boundary between these regimes.Comment: This version has been significantly revise

    Does Dirichlet Prior Smoothing Solve the Shannon Entropy Estimation Problem?

    Full text link
    The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do \emph{not} improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman, and Wu and Yang. The performance of the minimax rate-optimal estimator with nn samples is essentially \emph{at least} as good as that of the Dirichlet smoothed entropy estimators with nln⁑nn\ln n samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly developed and exploited by Jiao, Venkat, Han, and Weissman. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.Comment: 27 pages, 1 figure, published on IEEE Transactions on Information Theory, merged with https://arxiv.org/abs/1406.695

    Adaptive Estimation of Shannon Entropy

    Full text link
    We consider estimating the Shannon entropy of a discrete distribution PP from nn i.i.d. samples. Recently, Jiao, Venkat, Han, and Weissman, and Wu and Yang constructed approximation theoretic estimators that achieve the minimax L2L_2 rates in estimating entropy. Their estimators are consistent given n≫Sln⁑Sn \gg \frac{S}{\ln S} samples, where SS is the alphabet size, and it is the best possible sample complexity. In contrast, the Maximum Likelihood Estimator (MLE), which is the empirical entropy, requires n≫Sn\gg S samples. In the present paper we significantly refine the minimax results of existing work. To alleviate the pessimism of minimaxity, we adopt the adaptive estimation framework, and show that the minimax rate-optimal estimator in Jiao, Venkat, Han, and Weissman achieves the minimax rates simultaneously over a nested sequence of subsets of distributions PP, without knowing the alphabet size SS or which subset PP lies in. In other words, their estimator is adaptive with respect to this nested sequence of the parameter space, which is characterized by the entropy of the distribution. We also characterize the maximum risk of the MLE over this nested sequence, and show, for every subset in the sequence, that the performance of the minimax rate-optimal estimator with nn samples is essentially that of the MLE with nln⁑nn\ln n samples, thereby further substantiating the generality of the phenomenon identified by Jiao, Venkat, Han, and Weissman
    • …
    corecore