1,636 research outputs found
Projected support points: a new method for high-dimensional data reduction
In an era where big and high-dimensional data is readily available, data
scientists are inevitably faced with the challenge of reducing this data for
expensive downstream computation or analysis. To this end, we present here a
new method for reducing high-dimensional big data into a representative point
set, called projected support points (PSPs). A key ingredient in our method is
the so-called sparsity-inducing (SpIn) kernel, which encourages the
preservation of low-dimensional features when reducing high-dimensional data.
We begin by introducing a unifying theoretical framework for data reduction,
connecting PSPs with fundamental sampling principles from experimental design
and Quasi-Monte Carlo. Through this framework, we then derive sparsity
conditions under which the curse-of-dimensionality in data reduction can be
lifted for our method. Next, we propose two algorithms for one-shot and
sequential reduction via PSPs, both of which exploit big data subsampling and
majorization-minimization for efficient optimization. Finally, we demonstrate
the practical usefulness of PSPs in two real-world applications, the first for
data reduction in kernel learning, and the second for reducing Markov Chain
Monte Carlo (MCMC) chains
Optimally-Weighted Herding is Bayesian Quadrature
Herding and kernel herding are deterministic methods of choosing samples
which summarise a probability distribution. A related task is choosing samples
for estimating integrals using Bayesian quadrature. We show that the criterion
minimised when selecting samples in kernel herding is equivalent to the
posterior variance in Bayesian quadrature. We then show that sequential
Bayesian quadrature can be viewed as a weighted version of kernel herding which
achieves performance superior to any other weighted herding method. We
demonstrate empirically a rate of convergence faster than O(1/N). Our results
also imply an upper bound on the empirical error of the Bayesian quadrature
estimate.Comment: Accepted as an oral presentation at Uncertainty in Artificial
Intelligence 2012. Updated to fix several typo
Predicting the Future Behavior of a Time-Varying Probability Distribution
We study the problem of predicting the future, though only in the
probabilistic sense of estimating a future state of a time-varying probability
distribution. This is not only an interesting academic problem, but solving
this extrapolation problem also has many practical application, e.g. for
training classifiers that have to operate under time-varying conditions. Our
main contribution is a method for predicting the next step of the time-varying
distribution from a given sequence of sample sets from earlier time steps. For
this we rely on two recent machine learning techniques: embedding probability
distributions into a reproducing kernel Hilbert space, and learning operators
by vector-valued regression. We illustrate the working principles and the
practical usefulness of our method by experiments on synthetic and real data.
We also highlight an exemplary application: training a classifier in a domain
adaptation setting without having access to examples from the test time
distribution at training time
Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines
This paper presents a practical, and theoretically well-founded, approach to
improve the speed of kernel manifold learning algorithms relying on spectral
decomposition. Utilizing recent insights in kernel smoothing and learning with
integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests
an easy-to-implement method to remove or replace samples with minimal effect on
the empirical operator. A simple data point selection procedure is given to
generate a substitute density for the data, with accuracy that is governed by a
user-tunable parameter . The effect of the approximation on the quality of the
KPCA solution, in terms of spectral and operator errors, can be shown directly
in terms of the density estimate error and as a function of the parameter . We
show in experiments that RSKPCA can improve both training and evaluation time
of KPCA by up to an order of magnitude, and compares favorably to the
widely-used Nystrom and density-weighted Nystrom methods
Kernel Mean Embedding of Distributions: A Review and Beyond
A Hilbert space embedding of a distribution---in short, a kernel mean
embedding---has recently emerged as a powerful tool for machine learning and
inference. The basic idea behind this framework is to map distributions into a
reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel
methods can be extended to probability measures. It can be viewed as a
generalization of the original "feature map" common to support vector machines
(SVMs) and other kernel methods. While initially closely associated with the
latter, it has meanwhile found application in fields ranging from kernel
machines and probabilistic modeling to statistical inference, causal discovery,
and deep learning. The goal of this survey is to give a comprehensive review of
existing work and recent advances in this research area, and to discuss the
most challenging issues and open problems that could lead to new research
directions. The survey begins with a brief introduction to the RKHS and
positive definite kernels which forms the backbone of this survey, followed by
a thorough discussion of the Hilbert space embedding of marginal distributions,
theoretical guarantees, and a review of its applications. The embedding of
distributions enables us to apply RKHS methods to probability measures which
prompts a wide range of applications such as kernel two-sample testing,
independent testing, and learning on distributional data. Next, we discuss the
Hilbert space embedding for conditional distributions, give theoretical
insights, and review some applications. The conditional mean embedding enables
us to perform sum, product, and Bayes' rules---which are ubiquitous in
graphical model, probabilistic inference, and reinforcement learning---in a
non-parametric way. We then discuss relationships between this framework and
other related areas. Lastly, we give some suggestions on future research
directions.Comment: 147 pages; this is a version of the manuscript after the review
proces
Filtering with State-Observation Examples via Kernel Monte Carlo Filter
This paper addresses the problem of filtering with a state-space model.
Standard approaches for filtering assume that a probabilistic model for
observations (i.e. the observation model) is given explicitly or at least
parametrically. We consider a setting where this assumption is not satisfied;
we assume that the knowledge of the observation model is only provided by
examples of state-observation pairs. This setting is important and appears when
state variables are defined as quantities that are very different from the
observations. We propose Kernel Monte Carlo Filter, a novel filtering method
that is focused on this setting. Our approach is based on the framework of
kernel mean embeddings, which enables nonparametric posterior inference using
the state-observation examples. The proposed method represents state
distributions as weighted samples, propagates these samples by sampling,
estimates the state posteriors by Kernel Bayes' Rule, and resamples by Kernel
Herding. In particular, the sampling and resampling procedures are novel in
being expressed using kernel mean embeddings, so we theoretically analyze their
behaviors. We reveal the following properties, which are similar to those of
corresponding procedures in particle methods: (1) the performance of sampling
can degrade if the effective sample size of a weighted sample is small; (2)
resampling improves the sampling performance by increasing the effective sample
size. We first demonstrate these theoretical findings by synthetic experiments.
Then we show the effectiveness of the proposed filter by artificial and real
data experiments, which include vision-based mobile robot localization.Comment: 56 pages, 25 figures; Final version (accepted to Neural Computation
Learning to Herd Agents Amongst Obstacles: Training Robust Shepherding Behaviors using Deep Reinforcement Learning
Robotic shepherding problem considers the control and navigation of a group
of coherent agents (e.g., a flock of bird or a fleet of drones) through the
motion of an external robot, called shepherd. Machine learning based methods
have successfully solved this problem in an empty environment with no
obstacles. Rule-based methods, on the other hand, can handle more complex
scenarios in which environments are cluttered with obstacles and allow multiple
shepherds to work collaboratively. However, these rule-based methods are
fragile due to the difficulty in defining a comprehensive set of rules that can
handle all possible cases. To overcome these limitations, we propose the first
known learning-based method that can herd agents amongst obstacles. By using
deep reinforcement learning techniques combined with the probabilistic
roadmaps, we train a shepherding model using noisy but controlled environmental
and behavioral parameters. Our experimental results show that the proposed
method is robust, namely, it is insensitive to the uncertainties originated
from both environmental and behavioral models. Consequently, the proposed
method has a higher success rate, shorter completion time and path length than
the rule-based behavioral methods have. These advantages are particularly
prominent in more challenging scenarios involving more difficult groups and
strenuous passages
Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference
In likelihood-free settings where likelihood evaluations are intractable,
approximate Bayesian computation (ABC) addresses the formidable inference task
to discover plausible parameters of simulation programs that explain the
observations. However, they demand large quantities of simulation calls.
Critically, hyperparameters that determine measures of simulation discrepancy
crucially balance inference accuracy and sample efficiency, yet are difficult
to tune. In this paper, we present kernel embedding likelihood-free inference
(KELFI), a holistic framework that automatically learns model hyperparameters
to improve inference accuracy given limited simulation budget. By leveraging
likelihood smoothness with conditional mean embeddings, we nonparametrically
approximate likelihoods and posteriors as surrogate densities and sample from
closed-form posterior mean embeddings, whose hyperparameters are learned under
its approximate marginal likelihood. Our modular framework demonstrates
improved accuracy and efficiency on challenging inference problems in ecology.Comment: To appear in the Proceedings of the 22nd International Conference on
Artificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japa
Model Selection for Simulator-based Statistical Models: A Kernel Approach
We propose a novel approach to model selection for simulator-based
statistical models. The proposed approach defines a mixture of candidate
models, and then iteratively updates the weight coefficients for those models
as well as the parameters in each model simultaneously; this is done by
recursively applying Bayes' rule, using the recently proposed kernel recursive
ABC algorithm. The practical advantage of the method is that it can be used
even when a modeler lacks appropriate prior knowledge about the parameters in
each model. We demonstrate the effectiveness of the proposed approach with a
number of experiments, including model selection for dynamical systems in
ecology and epidemiology.Comment: 32 page
Improved Coresets for Kernel Density Estimates
We study the construction of coresets for kernel density estimates. That is
we show how to approximate the kernel density estimate described by a large
point set with another kernel density estimate with a much smaller point set.
For characteristic kernels (including Gaussian and Laplace kernels), our
approximation preserves the error between kernel density estimates
within error , with coreset size , but no other aspects
of the data, including the dimension, the diameter of the point set, or the
bandwidth of the kernel common to other approximations. When the dimension is
unrestricted, we show this bound is tight for these kernels as well as a much
broader set.
This work provides a careful analysis of the iterative Frank-Wolfe algorithm
adapted to this context, an algorithm called \emph{kernel herding}. This
analysis unites a broad line of work that spans statistics, machine learning,
and geometry.
When the dimension is constant, we demonstrate much tighter bounds on the
size of the coreset specifically for Gaussian kernels, showing that it is
bounded by the size of the coreset for axis-aligned rectangles. Currently the
best known constructive bound is , and non-constructively, this can be improved by
. This improves the best constant dimension
bounds polynomially for
- …