42,866 research outputs found
Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions
Kernel mean embeddings have recently attracted the attention of the machine
learning community. They map measures from some set to functions in a
reproducing kernel Hilbert space (RKHS) with kernel . The RKHS distance of
two mapped measures is a semi-metric over . We study three questions.
(I) For a given kernel, what sets can be embedded? (II) When is the
embedding injective over (in which case is a metric)? (III) How does
the -induced topology compare to other topologies on ? The existing
machine learning literature has addressed these questions in cases where is
(a subset of) the finite regular Borel measures. We unify, improve and
generalise those results. Our approach naturally leads to continuous and
possibly even injective embeddings of (Schwartz-) distributions, i.e.,
generalised measures, but the reader is free to focus on measures only. In
particular, we systemise and extend various (partly known) equivalences between
different notions of universal, characteristic and strictly positive definite
kernels, and show that on an underlying locally compact Hausdorff space,
metrises the weak convergence of probability measures if and only if is
continuous and characteristic.Comment: Old and longer version of the JMLR paper with same title (published
2018). Please start with the JMLR version. 55 pages (33 pages main text, 22
pages appendix), 2 tables, 1 figure (in appendix
Classification with unknown class-conditional label noise on non-compact feature spaces
We investigate the problem of classification in the presence of unknown
class-conditional label noise in which the labels observed by the learner have
been corrupted with some unknown class dependent probability. In order to
obtain finite sample rates, previous approaches to classification with unknown
class-conditional label noise have required that the regression function is
close to its extrema on sets of large measure. We shall consider this problem
in the setting of non-compact metric spaces, where the regression function need
not attain its extrema.
In this setting we determine the minimax optimal learning rates (up to
logarithmic factors). The rate displays interesting threshold behaviour: When
the regression function approaches its extrema at a sufficient rate, the
optimal learning rates are of the same order as those obtained in the
label-noise free setting. If the regression function approaches its extrema
more gradually then classification performance necessarily degrades. In
addition, we present an adaptive algorithm which attains these rates without
prior knowledge of either the distributional parameters or the local density.
This identifies for the first time a scenario in which finite sample rates are
achievable in the label noise setting, but they differ from the optimal rates
without label noise
Competing with stationary prediction strategies
In this paper we introduce the class of stationary prediction strategies and
construct a prediction algorithm that asymptotically performs as well as the
best continuous stationary strategy. We make mild compactness assumptions but
no stochastic assumptions about the environment. In particular, no assumption
of stationarity is made about the environment, and the stationarity of the
considered strategies only means that they do not depend explicitly on time; we
argue that it is natural to consider only stationary strategies even for highly
non-stationary environments.Comment: 20 page
Optimal rates of convergence for persistence diagrams in Topological Data Analysis
Computational topology has recently known an important development toward
data analysis, giving birth to the field of topological data analysis.
Topological persistence, or persistent homology, appears as a fundamental tool
in this field. In this paper, we study topological persistence in general
metric spaces, with a statistical approach. We show that the use of persistent
homology can be naturally considered in general statistical frameworks and
persistence diagrams can be used as statistics with interesting convergence
properties. Some numerical experiments are performed in various contexts to
illustrate our results
- …