154 research outputs found

    Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions

    Full text link
    Kernel mean embeddings have recently attracted the attention of the machine learning community. They map measures μ\mu from some set MM to functions in a reproducing kernel Hilbert space (RKHS) with kernel kk. The RKHS distance of two mapped measures is a semi-metric dkd_k over MM. We study three questions. (I) For a given kernel, what sets MM can be embedded? (II) When is the embedding injective over MM (in which case dkd_k is a metric)? (III) How does the dkd_k-induced topology compare to other topologies on MM? The existing machine learning literature has addressed these questions in cases where MM is (a subset of) the finite regular Borel measures. We unify, improve and generalise those results. Our approach naturally leads to continuous and possibly even injective embeddings of (Schwartz-) distributions, i.e., generalised measures, but the reader is free to focus on measures only. In particular, we systemise and extend various (partly known) equivalences between different notions of universal, characteristic and strictly positive definite kernels, and show that on an underlying locally compact Hausdorff space, dkd_k metrises the weak convergence of probability measures if and only if kk is continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published 2018). Please start with the JMLR version. 55 pages (33 pages main text, 22 pages appendix), 2 tables, 1 figure (in appendix

    A Primer on Reproducing Kernel Hilbert Spaces

    Full text link
    Reproducing kernel Hilbert spaces are elucidated without assuming prior familiarity with Hilbert spaces. Compared with extant pedagogic material, greater care is placed on motivating the definition of reproducing kernel Hilbert spaces and explaining when and why these spaces are efficacious. The novel viewpoint is that reproducing kernel Hilbert space theory studies extrinsic geometry, associating with each geometric configuration a canonical overdetermined coordinate system. This coordinate system varies continuously with changing geometric configurations, making it well-suited for studying problems whose solutions also vary continuously with changing geometry. This primer can also serve as an introduction to infinite-dimensional linear algebra because reproducing kernel Hilbert spaces have more properties in common with Euclidean spaces than do more general Hilbert spaces.Comment: Revised version submitted to Foundations and Trends in Signal Processin

    Fast Two-Sample Testing with Analytic Representations of Probability Measures

    Full text link
    We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable with low order statistics

    Minimax Estimation of Kernel Mean Embeddings

    Full text link
    In this paper, we study the minimax estimation of the Bochner integral μk(P):=∫Xk(⋅,x) dP(x),\mu_k(P):=\int_{\mathcal{X}} k(\cdot,x)\,dP(x), also called as the kernel mean embedding, based on random samples drawn i.i.d.~from PP, where k:X×X→Rk:\mathcal{X}\times\mathcal{X}\rightarrow\mathbb{R} is a positive definite kernel. Various estimators (including the empirical estimator), θ^n\hat{\theta}_n of μk(P)\mu_k(P) are studied in the literature wherein all of them satisfy ∥θ^n−μk(P)∥Hk=OP(n−1/2)\bigl\| \hat{\theta}_n-\mu_k(P)\bigr\|_{\mathcal{H}_k}=O_P(n^{-1/2}) with Hk\mathcal{H}_k being the reproducing kernel Hilbert space induced by kk. The main contribution of the paper is in showing that the above mentioned rate of n−1/2n^{-1/2} is minimax in ∥⋅∥Hk\|\cdot\|_{\mathcal{H}_k} and ∥⋅∥L2(Rd)\|\cdot\|_{L^2(\mathbb{R}^d)}-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with kk being a continuous translation-invariant kernel on Rd\mathbb{R}^d. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of PP (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance)
    • …
    corecore