3,636 research outputs found
The density of expected persistence diagrams and its kernel based estimation
Extended version of the SoCG proceedings, submitted to a journalInternational audiencePersistence diagrams play a fundamental role in Topological Data Analysis where they are used as topological descriptors of filtrations built on top of data. They consist in discrete multisets of points in the plane R 2 that can equivalently be seen as discrete measures in R 2. When the data come as a random point cloud, these discrete measures become random measures whose expectation is studied in this paper. First, we show that for a wide class of filtrations, including the ÄŚech and Rips-Vietoris filtrations, the expected persistence diagram, that is a deterministic measure on R 2 , has a density with respect to the Lebesgue measure. Second, building on the previous result we show that the persistence surface recently introduced in [Adams & al., Persistenceimages: a stable vector representation of persistent homology] can be seen as a kernel estimator of this density. We propose a cross-validation scheme for selecting an optimal bandwidth, which is proven to be a consistent procedure to estimate the density
The Density of Expected Persistence Diagrams and its Kernel Based Estimation
Persistence diagrams play a fundamental role in Topological Data Analysis where they are used as topological descriptors of filtrations built on top of data. They consist in discrete multisets of points in the plane R^2 that can equivalently be seen as discrete measures in R^2. When the data come as a random point cloud, these discrete measures become random measures whose expectation is studied in this paper. First, we show that for a wide class of filtrations, including the Cech and Rips-Vietoris filtrations, the expected persistence diagram, that is a deterministic measure on R^2, has a density with respect to the Lebesgue measure. Second, building on the previous result we show that the persistence surface recently introduced in [Adams et al., 2017] can be seen as a kernel estimator of this density. We propose a cross-validation scheme for selecting an optimal bandwidth, which is proven to be a consistent procedure to estimate the density
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
Persistence Flamelets: multiscale Persistent Homology for kernel density exploration
In recent years there has been noticeable interest in the study of the "shape
of data". Among the many ways a "shape" could be defined, topology is the most
general one, as it describes an object in terms of its connectivity structure:
connected components (topological features of dimension 0), cycles (features of
dimension 1) and so on. There is a growing number of techniques, generally
denoted as Topological Data Analysis, aimed at estimating topological
invariants of a fixed object; when we allow this object to change, however,
little has been done to investigate the evolution in its topology. In this work
we define the Persistence Flamelets, a multiscale version of one of the most
popular tool in TDA, the Persistence Landscape. We examine its theoretical
properties and we show how it could be used to gain insights on KDEs bandwidth
parameter
Modeling and replicating statistical topology, and evidence for CMB non-homogeneity
Under the banner of `Big Data', the detection and classification of structure
in extremely large, high dimensional, data sets, is, one of the central
statistical challenges of our times. Among the most intriguing approaches to
this challenge is `TDA', or `Topological Data Analysis', one of the primary
aims of which is providing non-metric, but topologically informative,
pre-analyses of data sets which make later, more quantitative analyses
feasible. While TDA rests on strong mathematical foundations from Topology, in
applications it has faced challenges due to an inability to handle issues of
statistical reliability and robustness and, most importantly, in an inability
to make scientific claims with verifiable levels of statistical confidence. We
propose a methodology for the parametric representation, estimation, and
replication of persistence diagrams, the main diagnostic tool of TDA. The power
of the methodology lies in the fact that even if only one persistence diagram
is available for analysis -- the typical case for big data applications --
replications can be generated to allow for conventional statistical hypothesis
testing. The methodology is conceptually simple and computationally practical,
and provides a broadly effective statistical procedure for persistence diagram
TDA analysis. We demonstrate the basic ideas on a toy example, and the power of
the approach in a novel and revealing analysis of CMB non-homogeneity
Optimal rates of convergence for persistence diagrams in Topological Data Analysis
Computational topology has recently known an important development toward
data analysis, giving birth to the field of topological data analysis.
Topological persistence, or persistent homology, appears as a fundamental tool
in this field. In this paper, we study topological persistence in general
metric spaces, with a statistical approach. We show that the use of persistent
homology can be naturally considered in general statistical frameworks and
persistence diagrams can be used as statistics with interesting convergence
properties. Some numerical experiments are performed in various contexts to
illustrate our results
- …