1,982 research outputs found

    Contraction of Locally Differentially Private Mechanisms

    Full text link
    We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between PKP\mathsf{K} and QKQ\mathsf{K} output distributions of an Ξ΅\varepsilon-LDP mechanism K\mathsf{K} in terms of a divergence between the corresponding input distributions PP and QQ, respectively. Our first main technical result presents a sharp upper bound on the Ο‡2\chi^2-divergence Ο‡2(PKβˆ₯QK)\chi^2(P\mathsf{K}\|Q\mathsf{K}) in terms of Ο‡2(Pβˆ₯Q)\chi^2(P\|Q) and Ξ΅\varepsilon. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on Ο‡2(PKβˆ₯QK)\chi^2(P\mathsf{K}\|Q\mathsf{K}) in terms of total variation distance TV(P,Q)\mathsf{TV}(P, Q) and Ξ΅\varepsilon. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing

    Minimax Estimation of Kernel Mean Embeddings

    Full text link
    In this paper, we study the minimax estimation of the Bochner integral ΞΌk(P):=∫Xk(β‹…,x) dP(x),\mu_k(P):=\int_{\mathcal{X}} k(\cdot,x)\,dP(x), also called as the kernel mean embedding, based on random samples drawn i.i.d.~from PP, where k:XΓ—Xβ†’Rk:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R} is a positive definite kernel. Various estimators (including the empirical estimator), ΞΈ^n\hat{\theta}_n of ΞΌk(P)\mu_k(P) are studied in the literature wherein all of them satisfy βˆ₯ΞΈ^nβˆ’ΞΌk(P)βˆ₯Hk=OP(nβˆ’1/2)\bigl\| \hat{\theta}_n-\mu_k(P)\bigr\|_{\mathcal{H}_k}=O_P(n^{-1/2}) with Hk\mathcal{H}_k being the reproducing kernel Hilbert space induced by kk. The main contribution of the paper is in showing that the above mentioned rate of nβˆ’1/2n^{-1/2} is minimax in βˆ₯β‹…βˆ₯Hk\|\cdot\|_{\mathcal{H}_k} and βˆ₯β‹…βˆ₯L2(Rd)\|\cdot\|_{L^2(\mathbb{R}^d)}-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with kk being a continuous translation-invariant kernel on Rd\mathbb{R}^d. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of PP (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance)

    Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

    Full text link
    This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating Pβˆ—NΟƒP\ast\mathcal{N}_\sigma, for NΟƒβ‰œN(0,Οƒ2Id)\mathcal{N}_\sigma\triangleq\mathcal{N}(0,\sigma^2 \mathrm{I}_d), by P^nβˆ—NΟƒ\hat{P}_n\ast\mathcal{N}_\sigma, where P^n\hat{P}_n is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and Ο‡2\chi^2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W1\mathsf{W}_1) converges at rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}} in remarkable contrast to a typical nβˆ’1dn^{-\frac{1}{d}} rate for unsmoothed W1\mathsf{W}_1 (and dβ‰₯3d\ge 3). For the KL divergence, squared 2-Wasserstein distance (W22\mathsf{W}_2^2), and Ο‡2\chi^2-divergence, the convergence rate is eO(d)nβˆ’1e^{O(d)}n^{-1}, but only if PP achieves finite input-output Ο‡2\chi^2 mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to Ο‰(nβˆ’1)\omega(n^{-1}) for the KL divergence and W22\mathsf{W}_2^2, while the Ο‡2\chi^2-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) in the high-dimensional regime. The distribution PP is unknown but nn i.i.d samples from it are available. We first show that any good estimator of h(Pβˆ—NΟƒ)h(P\ast\mathcal{N}_\sigma) must have sample complexity that is exponential in dd. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate eO(d)nβˆ’12e^{O(d)}n^{-\frac{1}{2}}, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.Comment: arXiv admin note: substantial text overlap with arXiv:1810.1158
    • …
    corecore