Search CORE

14 research outputs found

Fast Kernel-Based Independent Component Analysis

Author: A. Gretton
Hao Shen
S. Jegelka
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Autoregressive Kernels For Time Series

Author: Cuturi Marco
Doucet Arnaud
Publication venue
Publication date: 01/01/2011
Field of study

We propose in this work a new family of kernels for variable-length time series. Our work builds upon the vector autoregressive (VAR) model for multivariate stochastic processes: given a multivariate time series x, we consider the likelihood function p_{\theta}(x) of different parameters \theta in the VAR model as features to describe x. To compare two time series x and x', we form the product of their features p_{\theta}(x) p_{\theta}(x') which is integrated out w.r.t \theta using a matrix normal-inverse Wishart prior. Among other properties, this kernel can be easily computed when the dimension d of the time series is much larger than the lengths of the considered time series x and x'. It can also be generalized to time series taking values in arbitrary state spaces, as long as the state space itself is endowed with a kernel \kappa. In that case, the kernel between x and x' is a a function of the Gram matrices produced by \kappa on observations and subsequences of observations enumerated in x and x'. We describe a computationally efficient implementation of this generalization that uses low-rank matrix factorization techniques. These kernels are compared to other known kernels using a set of benchmark classification tasks carried out with support vector machines

arXiv.org e-Print Archive

CiteSeerX

Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions

Author: Schölkopf Bernhard
Simon-Gabriel Carl-Johann
Publication venue
Publication date: 01/01/2018
Field of study

Kernel mean embeddings have recently attracted the attention of the machine learning community. They map measures

\mu

from some set

M

to functions in a reproducing kernel Hilbert space (RKHS) with kernel

k

. The RKHS distance of two mapped measures is a semi-metric

d_k

over

M

. We study three questions. (I) For a given kernel, what sets

M

can be embedded? (II) When is the embedding injective over

M

(in which case

d_k

is a metric)? (III) How does the

d_k

-induced topology compare to other topologies on

M

? The existing machine learning literature has addressed these questions in cases where

M

is (a subset of) the finite regular Borel measures. We unify, improve and generalise those results. Our approach naturally leads to continuous and possibly even injective embeddings of (Schwartz-) distributions, i.e., generalised measures, but the reader is free to focus on measures only. In particular, we systemise and extend various (partly known) equivalences between different notions of universal, characteristic and strictly positive definite kernels, and show that on an underlying locally compact Hausdorff space,

d_k

metrises the weak convergence of probability measures if and only if

k

is continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published 2018). Please start with the JMLR version. 55 pages (33 pages main text, 22 pages appendix), 2 tables, 1 figure (in appendix

arXiv.org e-Print Archive

Publikationsserver der Universität Tübingen

MPG.PuRe

New improvements in the use of dependence measures for sensitivity analysis and screening

Author: de Lozzo Matthias
Marrel Amandine
Publication venue: 'Informa UK Limited'
Publication date: 04/12/2015
Field of study

International audiencePhysical phenomena are commonly modeled by numerical simulators. Such codes can take as input a high number of uncertain parameters and it is important to identify their influences via a global sensitivity analysis (GSA). However, these codes can be time consuming which prevents a GSA based on the classical Sobol' indices, requiring too many simulations. This is especially true as the number of inputs is important. To address this limitation, we consider recent advances in dependence measures, focusing on the distance correlation and the Hilbert-Schmidt independence criterion (HSIC). Our objective is to study these indices and use them for a screening purpose. Numerical tests reveal some differences between dependence measures and classical Sobol' indices, and preliminary answers to "What sensitivity indices to what situation?" are derived. Then, two approaches are proposed to use the dependence measures for a screening purpose. The first one directly uses these indices with independence tests; asymptotic tests and their spectral extensions exist and are detailed. For a higher accuracy in presence of small samples, we propose a non-asymptotic version based on bootstrap sampling. The second approach is based on a linear model associating two simulations, which explains their output difference as a weighed sum of their input differences. From this, a bootstrap method is proposed for the selection of the influential inputs. We also propose a heuristic approach for the calibration of the HSIC Lasso method. Numerical experiments are performed and show the potential of these approaches for screening when many inputs are not influential

Crossref

HAL-CEA

A Kernel Two-Sample Test

Author: Alexander Smola
Bernhard Schölkopf
Karsten M. Borgwardt
Malte J. Rasch
Nicolas Vayatis
Xinjiekouwai St
Publication venue
Publication date: 01/01/2012
Field of study

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests

CiteSeerX

UCL Discovery

MPG.PuRe