109 research outputs found
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
A Smoothed Dual Approach for Variational Wasserstein Problems
Variational problems that involve Wasserstein distances have been recently
proposed to summarize and learn from probability measures. Despite being
conceptually simple, such problems are computationally challenging because they
involve minimizing over quantities (Wasserstein distances) that are themselves
hard to compute. We show that the dual formulation of Wasserstein variational
problems introduced recently by Carlier et al. (2014) can be regularized using
an entropic smoothing, which leads to smooth, differentiable, convex
optimization problems that are simpler to implement and numerically more
stable. We illustrate the versatility of this approach by applying it to the
computation of Wasserstein barycenters and gradient flows of spacial
regularization functionals
Autoregressive Kernels For Time Series
We propose in this work a new family of kernels for variable-length time
series. Our work builds upon the vector autoregressive (VAR) model for
multivariate stochastic processes: given a multivariate time series x, we
consider the likelihood function p_{\theta}(x) of different parameters \theta
in the VAR model as features to describe x. To compare two time series x and
x', we form the product of their features p_{\theta}(x) p_{\theta}(x') which is
integrated out w.r.t \theta using a matrix normal-inverse Wishart prior. Among
other properties, this kernel can be easily computed when the dimension d of
the time series is much larger than the lengths of the considered time series x
and x'. It can also be generalized to time series taking values in arbitrary
state spaces, as long as the state space itself is endowed with a kernel
\kappa. In that case, the kernel between x and x' is a a function of the Gram
matrices produced by \kappa on observations and subsequences of observations
enumerated in x and x'. We describe a computationally efficient implementation
of this generalization that uses low-rank matrix factorization techniques.
These kernels are compared to other known kernels using a set of benchmark
classification tasks carried out with support vector machines
- …