521 research outputs found
Supervised Learning with Indefinite Topological Kernels
Topological Data Analysis (TDA) is a recent and growing branch of statistics
devoted to the study of the shape of the data. In this work we investigate the
predictive power of TDA in the context of supervised learning. Since
topological summaries, most noticeably the Persistence Diagram, are typically
defined in complex spaces, we adopt a kernel approach to translate them into
more familiar vector spaces. We define a topological exponential kernel, we
characterize it, and we show that, despite not being positive semi-definite, it
can be successfully used in regression and classification tasks
Geometric Inference on Kernel Density Estimates
We show that geometric inference of a point cloud can be calculated by
examining its kernel density estimate with a Gaussian kernel. This allows one
to consider kernel density estimates, which are robust to spatial noise,
subsampling, and approximate computation in comparison to raw point sets. This
is achieved by examining the sublevel sets of the kernel distance, which
isomorphically map to superlevel sets of the kernel density estimate. We prove
new properties about the kernel distance, demonstrating stability results and
allowing it to inherit reconstruction results from recent advances in
distance-based topological reconstruction. Moreover, we provide an algorithm to
estimate its topology using weighted Vietoris-Rips complexes.Comment: To appear in SoCG 2015. 36 pages, 5 figure
The persistence landscape and some of its properties
Persistence landscapes map persistence diagrams into a function space, which
may often be taken to be a Banach space or even a Hilbert space. In the latter
case, it is a feature map and there is an associated kernel. The main advantage
of this summary is that it allows one to apply tools from statistics and
machine learning. Furthermore, the mapping from persistence diagrams to
persistence landscapes is stable and invertible. We introduce a weighted
version of the persistence landscape and define a one-parameter family of
Poisson-weighted persistence landscape kernels that may be useful for learning.
We also demonstrate some additional properties of the persistence landscape.
First, the persistence landscape may be viewed as a tropical rational function.
Second, in many cases it is possible to exactly reconstruct all of the
component persistence diagrams from an average persistence landscape. It
follows that the persistence landscape kernel is characteristic for certain
generic empirical measures. Finally, the persistence landscape distance may be
arbitrarily small compared to the interleaving distance.Comment: 18 pages, to appear in the Proceedings of the 2018 Abel Symposiu
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
The Christoffel-Darboux kernel for topological data analysis
Persistent homology has been widely used to study the topology of point
clouds in . Standard approaches are very sensitive to outliers,
and their computational complexity depends badly on the number of data points.
In this paper we introduce a novel persistence module for a point cloud using
the theory of Christoffel-Darboux kernels. This module is robust to
(statistical) outliers in the data, and can be computed in time linear in the
number of data points. We illustrate the benefits and limitations of our new
module with various numerical examples in , for . Our
work expands upon recent applications of Christoffel-Darboux kernels in the
context of statistical data analysis and geometric inference (Lasserre, Pauwels
and Putinar, 2022). There, these kernels are used to construct a polynomial
whose level sets capture the geometry of a point cloud in a precise sense. We
show that the persistent homology associated to the sublevel set filtration of
this polynomial is stable with respect to the Wasserstein distance. Moreover,
we show that the persistent homology of this filtration can be computed in
singly exponential time in the ambient dimension , using a recent algorithm
of Basu & Karisani (2022).Comment: 22 pages, 11 figures, 1 tabl
Kernel-based methods for persistent homology and their applications to Alzheimer's Disease
Kernel-based methods are powerful tools that are widely applied in many applications and fields of research. In recent years, methods from computational topology have emerged for characterizing the intrinsic geometry of data. Persistence homology is a central tool in topological data analysis, which allows to capture the evolution of topological features of the data. Persistence diagrams represent a natural way to summarize these features, but they can not be directly used in machine learning algorithms. To deal with them, we first analyse various kernel-based methods of recent development, then we propose and apply Variable Scaled Kernels (VSKs) to the persistence diagrams framework. We therefore discuss the application of these kernels in medical imaging in the context of Alzheimer’s Disease classification. Taking into account the
cortical thickness measures on the cortical surface, we build the
persistence diagrams upon different MRI subjects and we perform some classification tests using the support vector machines classifier.ope
- …