21 research outputs found

    A Kernel Test for Three-Variable Interactions

    Full text link
    We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures

    A One-Sample Test for Normality with Kernel Methods

    Get PDF
    We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence our procedure may be applied either to test data for normality or to test parameters (mean and covariance) if data are assumed Gaussian. Our test is based on the same principle as the MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such as homogeneity or independence testing. Our method makes use of a special kind of parametric bootstrap (typical of goodness-of-fit tests) which is computationally more efficient than standard parametric bootstrap. Moreover, an upper bound for the Type-II error highlights the dependence on influential quantities. Experiments illustrate the practical improvement allowed by our test in high-dimensional settings where common normality tests are known to fail. We also consider an application to covariance rank selection through a sequential procedure

    Kernel Belief Propagation

    Full text link
    We propose a nonparametric generalization of belief propagation, Kernel Belief Propagation (KBP), for pairwise Markov random fields. Messages are represented as functions in a reproducing kernel Hilbert space (RKHS), and message updates are simple linear operations in the RKHS. KBP makes none of the assumptions commonly required in classical BP algorithms: the variables need not arise from a finite domain or a Gaussian distribution, nor must their relations take any particular parametric form. Rather, the relations between variables are represented implicitly, and are learned nonparametrically from training data. KBP has the advantage that it may be used on any domain where kernels are defined (Rd, strings, groups), even where explicit parametric models are not known, or closed form expressions for the BP updates do not exist. The computational cost of message updates in KBP is polynomial in the training data size. We also propose a constant time approximate message update procedure by representing messages using a small number of basis functions. In experiments, we apply KBP to image denoising, depth prediction from still images, and protein configuration prediction: KBP is faster than competing classical and nonparametric approaches (by orders of magnitude, in some cases), while providing significantly more accurate results

    Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions

    Full text link
    Kernel mean embeddings have recently attracted the attention of the machine learning community. They map measures μ\mu from some set MM to functions in a reproducing kernel Hilbert space (RKHS) with kernel kk. The RKHS distance of two mapped measures is a semi-metric dkd_k over MM. We study three questions. (I) For a given kernel, what sets MM can be embedded? (II) When is the embedding injective over MM (in which case dkd_k is a metric)? (III) How does the dkd_k-induced topology compare to other topologies on MM? The existing machine learning literature has addressed these questions in cases where MM is (a subset of) the finite regular Borel measures. We unify, improve and generalise those results. Our approach naturally leads to continuous and possibly even injective embeddings of (Schwartz-) distributions, i.e., generalised measures, but the reader is free to focus on measures only. In particular, we systemise and extend various (partly known) equivalences between different notions of universal, characteristic and strictly positive definite kernels, and show that on an underlying locally compact Hausdorff space, dkd_k metrises the weak convergence of probability measures if and only if kk is continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published 2018). Please start with the JMLR version. 55 pages (33 pages main text, 22 pages appendix), 2 tables, 1 figure (in appendix

    Non-commutative harmonic analysis in multi-object tracking

    Get PDF
    Simultaneously tracking n targets in space involves two closely coupled tasks: estimating the current positions x1, x2, . . . , xn of their tracks, and estimating the assignment σ: {1, 2, . . . , n} → {1, 2, . . . , n} of targets to tracks. While the former is often a relatively straightforward extension of the single target case, the latter, called identity management or data association, is a fundamentally combinatorial problem, which is harder to fit in a computationally efficient probabilistic framework. Identity management is difficult because the number of possible assignments grows with n!. This means that for n greater than about 10 or 12, representing the distribution p(σ) explicitly as an array of n! numbers is generally not possible. In this chapter we discuss a solution to this problem based on the generalisation of harmonic analysis to non-commutative groups, specifically, in our case, the group of permutations. According to this theory, the Fourier transform of p takes the form ^p(λ)= Σ_(σ∈S_n)p(σ)pλ(σ) where S_n denotes the group of permutations of n objects, λ is a combinatorial object called an integer partition, and ρλ is a special matrix-valued function called a representation. These terms are defined in our short primer on representation theory in Section 13.2. What is important to note is that, since ρλ is matrix-valued, each Fourier component ^p(λ) is a matrix, not just a scalar. Apart from this surprising feature, non-commutative Fourier transforms are very similar to their familiar commutative counterparts. In particular, we argue that there is a well-defined sense in which some of the ^p(λ) matrices are the ‘low-frequency’ components of p, and approximating p with this subset of components is optimal. A large part of this chapter is focused on how to define such a notion of ‘frequency’, and how to find the corresponding Fourier components.We describe two seemingly very different approaches to answering this question, and find, reassuringly, that they give exactly the same answer. Of course, in addition to a compact way of representing p, efficient inference also demands fast algorithms for updating p with observations. Section 13.6 gives an overview of the fast Fourier methods that are employed for this purpose
    corecore