21 research outputs found
A Kernel Test for Three-Variable Interactions
We introduce kernel nonparametric tests for Lancaster three-variable
interaction and for total independence, using embeddings of signed measures
into a reproducing kernel Hilbert space. The resulting test statistics are
straightforward to compute, and are used in powerful interaction tests, which
are consistent against all alternatives for a large family of reproducing
kernels. We show the Lancaster test to be sensitive to cases where two
independent causes individually have weak influence on a third dependent
variable, but their combined effect has a strong influence. This makes the
Lancaster test especially suited to finding structure in directed graphical
models, where it outperforms competing nonparametric tests in detecting such
V-structures
A One-Sample Test for Normality with Kernel Methods
We propose a new one-sample test for normality in a Reproducing Kernel
Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a
given family of Gaussian distributions. Hence our procedure may be applied
either to test data for normality or to test parameters (mean and covariance)
if data are assumed Gaussian. Our test is based on the same principle as the
MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such
as homogeneity or independence testing. Our method makes use of a special kind
of parametric bootstrap (typical of goodness-of-fit tests) which is
computationally more efficient than standard parametric bootstrap. Moreover, an
upper bound for the Type-II error highlights the dependence on influential
quantities. Experiments illustrate the practical improvement allowed by our
test in high-dimensional settings where common normality tests are known to
fail. We also consider an application to covariance rank selection through a
sequential procedure
Kernel Belief Propagation
We propose a nonparametric generalization of belief propagation, Kernel
Belief Propagation (KBP), for pairwise Markov random fields. Messages are
represented as functions in a reproducing kernel Hilbert space (RKHS), and
message updates are simple linear operations in the RKHS. KBP makes none of the
assumptions commonly required in classical BP algorithms: the variables need
not arise from a finite domain or a Gaussian distribution, nor must their
relations take any particular parametric form. Rather, the relations between
variables are represented implicitly, and are learned nonparametrically from
training data. KBP has the advantage that it may be used on any domain where
kernels are defined (Rd, strings, groups), even where explicit parametric
models are not known, or closed form expressions for the BP updates do not
exist. The computational cost of message updates in KBP is polynomial in the
training data size. We also propose a constant time approximate message update
procedure by representing messages using a small number of basis functions. In
experiments, we apply KBP to image denoising, depth prediction from still
images, and protein configuration prediction: KBP is faster than competing
classical and nonparametric approaches (by orders of magnitude, in some cases),
while providing significantly more accurate results
Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions
Kernel mean embeddings have recently attracted the attention of the machine
learning community. They map measures from some set to functions in a
reproducing kernel Hilbert space (RKHS) with kernel . The RKHS distance of
two mapped measures is a semi-metric over . We study three questions.
(I) For a given kernel, what sets can be embedded? (II) When is the
embedding injective over (in which case is a metric)? (III) How does
the -induced topology compare to other topologies on ? The existing
machine learning literature has addressed these questions in cases where is
(a subset of) the finite regular Borel measures. We unify, improve and
generalise those results. Our approach naturally leads to continuous and
possibly even injective embeddings of (Schwartz-) distributions, i.e.,
generalised measures, but the reader is free to focus on measures only. In
particular, we systemise and extend various (partly known) equivalences between
different notions of universal, characteristic and strictly positive definite
kernels, and show that on an underlying locally compact Hausdorff space,
metrises the weak convergence of probability measures if and only if is
continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published
2018). Please start with the JMLR version. 55 pages (33 pages main text, 22
pages appendix), 2 tables, 1 figure (in appendix
Non-commutative harmonic analysis in multi-object tracking
Simultaneously tracking n targets in space involves two closely coupled tasks: estimating the current positions x1, x2, . . . , xn of their tracks, and estimating the assignment σ: {1, 2, . . . , n} → {1, 2, . . . , n} of targets to tracks. While the former is often a relatively
straightforward extension of the single target case, the latter, called identity management or data association, is a fundamentally combinatorial problem, which is harder to fit in a computationally efficient probabilistic framework.
Identity management is difficult because the number of possible assignments grows with n!. This means that for n greater than about 10 or 12, representing the distribution
p(σ) explicitly as an array of n! numbers is generally not possible. In this chapter we discuss a solution to this problem based on the generalisation of harmonic analysis to non-commutative groups, specifically, in our case, the group of permutations. According to this theory, the Fourier transform of p takes the form
^p(λ)= Σ_(σ∈S_n)p(σ)pλ(σ)
where S_n denotes the group of permutations of n objects, λ is a combinatorial object called an integer partition, and ρλ is a special matrix-valued function called a representation. These terms are defined in our short primer on representation theory in Section 13.2. What is important to note is that, since ρλ is matrix-valued, each Fourier component
^p(λ) is a matrix, not just a scalar. Apart from this surprising feature, non-commutative Fourier transforms are very similar to their familiar commutative counterparts.
In particular, we argue that there is a well-defined sense in which some of the ^p(λ) matrices are the ‘low-frequency’ components of p, and approximating p with this subset of components is optimal. A large part of this chapter is focused on how to define such a notion of ‘frequency’, and how to find the corresponding Fourier components.We describe two seemingly very different approaches to answering this question, and find, reassuringly, that they give exactly the same answer.
Of course, in addition to a compact way of representing p, efficient inference also demands fast algorithms for updating p with observations. Section 13.6 gives an overview
of the fast Fourier methods that are employed for this purpose