research

Minimax Estimation of Kernel Mean Embeddings

Abstract

In this paper, we study the minimax estimation of the Bochner integral ΞΌk(P):=∫Xk(β‹…,x) dP(x),\mu_k(P):=\int_{\mathcal{X}} k(\cdot,x)\,dP(x), also called as the kernel mean embedding, based on random samples drawn i.i.d.~from PP, where k:XΓ—Xβ†’Rk:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R} is a positive definite kernel. Various estimators (including the empirical estimator), ΞΈ^n\hat{\theta}_n of ΞΌk(P)\mu_k(P) are studied in the literature wherein all of them satisfy βˆ₯ΞΈ^nβˆ’ΞΌk(P)βˆ₯Hk=OP(nβˆ’1/2)\bigl\| \hat{\theta}_n-\mu_k(P)\bigr\|_{\mathcal{H}_k}=O_P(n^{-1/2}) with Hk\mathcal{H}_k being the reproducing kernel Hilbert space induced by kk. The main contribution of the paper is in showing that the above mentioned rate of nβˆ’1/2n^{-1/2} is minimax in βˆ₯β‹…βˆ₯Hk\|\cdot\|_{\mathcal{H}_k} and βˆ₯β‹…βˆ₯L2(Rd)\|\cdot\|_{L^2(\mathbb{R}^d)}-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with kk being a continuous translation-invariant kernel on Rd\mathbb{R}^d. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of PP (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance)

    Similar works

    Full text

    thumbnail-image

    Available Versions