2,763 research outputs found
Optimal rates of convergence for persistence diagrams in Topological Data Analysis
Computational topology has recently known an important development toward
data analysis, giving birth to the field of topological data analysis.
Topological persistence, or persistent homology, appears as a fundamental tool
in this field. In this paper, we study topological persistence in general
metric spaces, with a statistical approach. We show that the use of persistent
homology can be naturally considered in general statistical frameworks and
persistence diagrams can be used as statistics with interesting convergence
properties. Some numerical experiments are performed in various contexts to
illustrate our results
Statistical Analysis and Parameter Selection for Mapper
In this article, we study the question of the statistical convergence of the
1-dimensional Mapper to its continuous analogue, the Reeb graph. We show that
the Mapper is an optimal estimator of the Reeb graph, which gives, as a
byproduct, a method to automatically tune its parameters and compute confidence
regions on its topological features, such as its loops and flares. This allows
to circumvent the issue of testing a large grid of parameters and keeping the
most stable ones in the brute-force setting, which is widely used in
visualization, clustering and feature selection with the Mapper.Comment: Minor modification
Subsampling Methods for Persistent Homology
Persistent homology is a multiscale method for analyzing the shape of sets
and functions from point cloud data arising from an unknown distribution
supported on those sets. When the size of the sample is large, direct
computation of the persistent homology is prohibitive due to the combinatorial
nature of the existing algorithms. We propose to compute the persistent
homology of several subsamples of the data and then combine the resulting
estimates. We study the risk of two estimators and we prove that the
subsampling approach carries stable topological information while achieving a
great reduction in computational complexity
Multiple testing with persistent homology
Multiple hypothesis testing requires a control procedure. Simply increasing
simulations or permutations to meet a Bonferroni-style threshold is
prohibitively expensive. In this paper we propose a null model based approach
to testing for acyclicity, coupled with a Family-Wise Error Rate (FWER) control
method that does not suffer from these computational costs. We adapt an False
Discovery Rate (FDR) control approach to the topological setting, and show it
to be compatible both with our null model approach and with previous approaches
to hypothesis testing in persistent homology. By extending a limit theorem for
persistent homology on samples from point processes, we provide theoretical
validation for our FWER and FDR control methods
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
- …