79 research outputs found
Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering
Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure
to obtain adaptive quadrature rules for integrals of functions in a reproducing
kernel Hilbert space (RKHS) with a potentially faster rate of convergence than
Monte Carlo integration (and "kernel herding" was shown to be a special case of
this procedure). In this paper, we propose to replace the random sampling step
in a particle filter by Frank-Wolfe optimization. By optimizing the position of
the particles, we can obtain better accuracy than random or quasi-Monte Carlo
sampling. In applications where the evaluation of the emission probabilities is
expensive (such as in robot localization), the additional computational cost to
generate the particles through optimization can be justified. Experiments on
standard synthetic examples as well as on a robot localization task indicate
indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.Comment: in 18th International Conference on Artificial Intelligence and
Statistics (AISTATS), May 2015, San Diego, United States. 38, JMLR Workshop
and Conference Proceeding
Herding as a Learning System with Edge-of-Chaos Dynamics
Herding defines a deterministic dynamical system at the edge of chaos. It
generates a sequence of model states and parameters by alternating parameter
perturbations with state maximizations, where the sequence of states can be
interpreted as "samples" from an associated MRF model. Herding differs from
maximum likelihood estimation in that the sequence of parameters does not
converge to a fixed point and differs from an MCMC posterior sampling approach
in that the sequence of states is generated deterministically. Herding may be
interpreted as a"perturb and map" method where the parameter perturbations are
generated using a deterministic nonlinear dynamical system rather than randomly
from a Gumbel distribution. This chapter studies the distinct statistical
characteristics of the herding algorithm and shows that the fast convergence
rate of the controlled moments may be attributed to edge of chaos dynamics. The
herding algorithm can also be generalized to models with latent variables and
to a discriminative learning setting. The perceptron cycling theorem ensures
that the fast moment matching property is preserved in the more general
framework
Bayesian posterior approximation via greedy particle optimization
In Bayesian inference, the posterior distributions are difficult to obtain
analytically for complex models such as neural networks. Variational inference
usually uses a parametric distribution for approximation, from which we can
easily draw samples. Recently discrete approximation by particles has attracted
attention because of its high expression ability. An example is Stein
variational gradient descent (SVGD), which iteratively optimizes particles.
Although SVGD has been shown to be computationally efficient empirically, its
theoretical properties have not been clarified yet and no finite sample bound
of the convergence rate is known. Another example is the Stein points (SP)
method, which minimizes kernelized Stein discrepancy directly. Although a
finite sample bound is assured theoretically, SP is computationally inefficient
empirically, especially in high-dimensional problems. In this paper, we propose
a novel method named maximum mean discrepancy minimization by the Frank-Wolfe
algorithm (MMD-FW), which minimizes MMD in a greedy way by the FW algorithm.
Our method is computationally efficient empirically and we show that its finite
sample convergence bound is in a linear order in finite dimensions
Sparse solutions of the kernel herding algorithm by improved gradient approximation
The kernel herding algorithm is used to construct quadrature rules in a
reproducing kernel Hilbert space (RKHS). While the computational efficiency of
the algorithm and stability of the output quadrature formulas are advantages of
this method, the convergence speed of the integration error for a given number
of nodes is slow compared to that of other quadrature methods. In this paper,
we propose a modified kernel herding algorithm whose framework was introduced
in a previous study and aim to obtain sparser solutions while preserving the
advantages of standard kernel herding. In the proposed algorithm, the negative
gradient is approximated by several vertex directions, and the current solution
is updated by moving in the approximate descent direction in each iteration. We
show that the convergence speed of the integration error is directly determined
by the cosine of the angle between the negative gradient and approximate
gradient. Based on this, we propose new gradient approximation algorithms and
analyze them theoretically, including through convergence analysis. In
numerical experiments, we confirm the effectiveness of the proposed algorithms
in terms of sparsity of nodes and computational efficiency. Moreover, we
provide a new theoretical analysis of the kernel quadrature rules with
fully-corrective weights, which realizes faster convergence speeds than those
of previous studies
- …