1,020 research outputs found
On the Bootstrap for Persistence Diagrams and Landscapes
Persistent homology probes topological properties from point clouds and
functions. By looking at multiple scales simultaneously, one can record the
births and deaths of topological features as the scale varies. In this paper we
use a statistical technique, the empirical bootstrap, to separate topological
signal from topological noise. In particular, we derive confidence sets for
persistence diagrams and confidence bands for persistence landscapes
Stochastic Convergence of Persistence Landscapes and Silhouettes
Persistent homology is a widely used tool in Topological Data Analysis that
encodes multiscale topological information as a multi-set of points in the
plane called a persistence diagram. It is difficult to apply statistical theory
directly to a random sample of diagrams. Instead, we can summarize the
persistent homology with the persistence landscape, introduced by Bubenik,
which converts a diagram into a well-behaved real-valued function. We
investigate the statistical properties of landscapes, such as weak convergence
of the average landscapes and convergence of the bootstrap. In addition, we
introduce an alternate functional summary of persistent homology, which we call
the silhouette, and derive an analogous statistical theory
Introduction to the R package TDA
We present a short tutorial and introduction to using the R package TDA,
which provides some tools for Topological Data Analysis. In particular, it
includes implementations of functions that, given some data, provide
topological information about the underlying space, such as the distance
function, the distance to a measure, the kNN density estimator, the kernel
density estimator, and the kernel distance. The salient topological features of
the sublevel sets (or superlevel sets) of these functions can be quantified
with persistent homology. We provide an R interface for the efficient
algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function
for the persistent homology of the Rips filtration, and one for the persistent
homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated
over a grid of points. The significance of the features in the resulting
persistence diagrams can be analyzed with functions that implement recently
developed statistical methods. The R package TDA also includes the
implementation of an algorithm for density clustering, which allows us to
identify the spatial organization of the probability mass associated to a
density function and visualize it by means of a dendrogram, the cluster tree
The persistence landscape and some of its properties
Persistence landscapes map persistence diagrams into a function space, which
may often be taken to be a Banach space or even a Hilbert space. In the latter
case, it is a feature map and there is an associated kernel. The main advantage
of this summary is that it allows one to apply tools from statistics and
machine learning. Furthermore, the mapping from persistence diagrams to
persistence landscapes is stable and invertible. We introduce a weighted
version of the persistence landscape and define a one-parameter family of
Poisson-weighted persistence landscape kernels that may be useful for learning.
We also demonstrate some additional properties of the persistence landscape.
First, the persistence landscape may be viewed as a tropical rational function.
Second, in many cases it is possible to exactly reconstruct all of the
component persistence diagrams from an average persistence landscape. It
follows that the persistence landscape kernel is characteristic for certain
generic empirical measures. Finally, the persistence landscape distance may be
arbitrarily small compared to the interleaving distance.Comment: 18 pages, to appear in the Proceedings of the 2018 Abel Symposiu
Multiple testing with persistent homology
Multiple hypothesis testing requires a control procedure. Simply increasing
simulations or permutations to meet a Bonferroni-style threshold is
prohibitively expensive. In this paper we propose a null model based approach
to testing for acyclicity, coupled with a Family-Wise Error Rate (FWER) control
method that does not suffer from these computational costs. We adapt an False
Discovery Rate (FDR) control approach to the topological setting, and show it
to be compatible both with our null model approach and with previous approaches
to hypothesis testing in persistent homology. By extending a limit theorem for
persistent homology on samples from point processes, we provide theoretical
validation for our FWER and FDR control methods
Statistical topological data analysis using persistence landscapes
We define a new topological summary for data that we call the persistence
landscape. Since this summary lies in a vector space, it is easy to combine
with tools from statistics and machine learning, in contrast to the standard
topological summaries. Viewed as a random variable with values in a Banach
space, this summary obeys a strong law of large numbers and a central limit
theorem. We show how a number of standard statistical tests can be used for
statistical inference using this summary. We also prove that this summary is
stable and that it can be used to provide lower bounds for the bottleneck and
Wasserstein distances.Comment: 26 pages, final version, to appear in Journal of Machine Learning
Research, includes two additional examples not in the journal version: random
geometric complexes and Erdos-Renyi random clique complexe
- …