27,282 research outputs found
Bandwidth selection for kernel estimation in mixed multi-dimensional spaces
Kernel estimation techniques, such as mean shift, suffer from one major
drawback: the kernel bandwidth selection. The bandwidth can be fixed for all
the data set or can vary at each points. Automatic bandwidth selection becomes
a real challenge in case of multidimensional heterogeneous features. This paper
presents a solution to this problem. It is an extension of \cite{Comaniciu03a}
which was based on the fundamental property of normal distributions regarding
the bias of the normalized density gradient. The selection is done iteratively
for each type of features, by looking for the stability of local bandwidth
estimates across a predefined range of bandwidths. A pseudo balloon mean shift
filtering and partitioning are introduced. The validity of the method is
demonstrated in the context of color image segmentation based on a
5-dimensional space
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
Persistence Flamelets: multiscale Persistent Homology for kernel density exploration
In recent years there has been noticeable interest in the study of the "shape
of data". Among the many ways a "shape" could be defined, topology is the most
general one, as it describes an object in terms of its connectivity structure:
connected components (topological features of dimension 0), cycles (features of
dimension 1) and so on. There is a growing number of techniques, generally
denoted as Topological Data Analysis, aimed at estimating topological
invariants of a fixed object; when we allow this object to change, however,
little has been done to investigate the evolution in its topology. In this work
we define the Persistence Flamelets, a multiscale version of one of the most
popular tool in TDA, the Persistence Landscape. We examine its theoretical
properties and we show how it could be used to gain insights on KDEs bandwidth
parameter
One-Class Support Measure Machines for Group Anomaly Detection
We propose one-class support measure machines (OCSMMs) for group anomaly
detection which aims at recognizing anomalous aggregate behaviors of data
points. The OCSMMs generalize well-known one-class support vector machines
(OCSVMs) to a space of probability measures. By formulating the problem as
quantile estimation on distributions, we can establish an interesting
connection to the OCSVMs and variable kernel density estimators (VKDEs) over
the input space on which the distributions are defined, bridging the gap
between large-margin methods and kernel density estimators. In particular, we
show that various types of VKDEs can be considered as solutions to a class of
regularization problems studied in this paper. Experiments on Sloan Digital Sky
Survey dataset and High Energy Particle Physics dataset demonstrate the
benefits of the proposed framework in real-world applications.Comment: Conference on Uncertainty in Artificial Intelligence (UAI2013
- …