2,024 research outputs found
Supervised Learning with Indefinite Topological Kernels
Topological Data Analysis (TDA) is a recent and growing branch of statistics
devoted to the study of the shape of the data. In this work we investigate the
predictive power of TDA in the context of supervised learning. Since
topological summaries, most noticeably the Persistence Diagram, are typically
defined in complex spaces, we adopt a kernel approach to translate them into
more familiar vector spaces. We define a topological exponential kernel, we
characterize it, and we show that, despite not being positive semi-definite, it
can be successfully used in regression and classification tasks
PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures
Persistence diagrams, the most common descriptors of Topological Data
Analysis, encode topological properties of data and have already proved pivotal
in many different applications of data science. However, since the (metric)
space of persistence diagrams is not Hilbert, they end up being difficult
inputs for most Machine Learning techniques. To address this concern, several
vectorization methods have been put forward that embed persistence diagrams
into either finite-dimensional Euclidean space or (implicit) infinite
dimensional Hilbert space with kernels. In this work, we focus on persistence
diagrams built on top of graphs. Relying on extended persistence theory and the
so-called heat kernel signature, we show how graphs can be encoded by
(extended) persistence diagrams in a provably stable way. We then propose a
general and versatile framework for learning vectorizations of persistence
diagrams, which encompasses most of the vectorization techniques used in the
literature. We finally showcase the experimental strength of our setup by
achieving competitive scores on classification tasks on real-life graph
datasets
Geometric Inference on Kernel Density Estimates
We show that geometric inference of a point cloud can be calculated by
examining its kernel density estimate with a Gaussian kernel. This allows one
to consider kernel density estimates, which are robust to spatial noise,
subsampling, and approximate computation in comparison to raw point sets. This
is achieved by examining the sublevel sets of the kernel distance, which
isomorphically map to superlevel sets of the kernel density estimate. We prove
new properties about the kernel distance, demonstrating stability results and
allowing it to inherit reconstruction results from recent advances in
distance-based topological reconstruction. Moreover, we provide an algorithm to
estimate its topology using weighted Vietoris-Rips complexes.Comment: To appear in SoCG 2015. 36 pages, 5 figure
The persistence landscape and some of its properties
Persistence landscapes map persistence diagrams into a function space, which
may often be taken to be a Banach space or even a Hilbert space. In the latter
case, it is a feature map and there is an associated kernel. The main advantage
of this summary is that it allows one to apply tools from statistics and
machine learning. Furthermore, the mapping from persistence diagrams to
persistence landscapes is stable and invertible. We introduce a weighted
version of the persistence landscape and define a one-parameter family of
Poisson-weighted persistence landscape kernels that may be useful for learning.
We also demonstrate some additional properties of the persistence landscape.
First, the persistence landscape may be viewed as a tropical rational function.
Second, in many cases it is possible to exactly reconstruct all of the
component persistence diagrams from an average persistence landscape. It
follows that the persistence landscape kernel is characteristic for certain
generic empirical measures. Finally, the persistence landscape distance may be
arbitrarily small compared to the interleaving distance.Comment: 18 pages, to appear in the Proceedings of the 2018 Abel Symposiu
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
- …