1,020 research outputs found

    On the Bootstrap for Persistence Diagrams and Landscapes

    Full text link
    Persistent homology probes topological properties from point clouds and functions. By looking at multiple scales simultaneously, one can record the births and deaths of topological features as the scale varies. In this paper we use a statistical technique, the empirical bootstrap, to separate topological signal from topological noise. In particular, we derive confidence sets for persistence diagrams and confidence bands for persistence landscapes

    Stochastic Convergence of Persistence Landscapes and Silhouettes

    Full text link
    Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram. It is difficult to apply statistical theory directly to a random sample of diagrams. Instead, we can summarize the persistent homology with the persistence landscape, introduced by Bubenik, which converts a diagram into a well-behaved real-valued function. We investigate the statistical properties of landscapes, such as weak convergence of the average landscapes and convergence of the bootstrap. In addition, we introduce an alternate functional summary of persistent homology, which we call the silhouette, and derive an analogous statistical theory

    Introduction to the R package TDA

    Get PDF
    We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the underlying space, such as the distance function, the distance to a measure, the kNN density estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or superlevel sets) of these functions can be quantified with persistent homology. We provide an R interface for the efficient algorithms of the C++ libraries GUDHI, Dionysus and PHAT, including a function for the persistent homology of the Rips filtration, and one for the persistent homology of sublevel sets (or superlevel sets) of arbitrary functions evaluated over a grid of points. The significance of the features in the resulting persistence diagrams can be analyzed with functions that implement recently developed statistical methods. The R package TDA also includes the implementation of an algorithm for density clustering, which allows us to identify the spatial organization of the probability mass associated to a density function and visualize it by means of a dendrogram, the cluster tree

    The persistence landscape and some of its properties

    Full text link
    Persistence landscapes map persistence diagrams into a function space, which may often be taken to be a Banach space or even a Hilbert space. In the latter case, it is a feature map and there is an associated kernel. The main advantage of this summary is that it allows one to apply tools from statistics and machine learning. Furthermore, the mapping from persistence diagrams to persistence landscapes is stable and invertible. We introduce a weighted version of the persistence landscape and define a one-parameter family of Poisson-weighted persistence landscape kernels that may be useful for learning. We also demonstrate some additional properties of the persistence landscape. First, the persistence landscape may be viewed as a tropical rational function. Second, in many cases it is possible to exactly reconstruct all of the component persistence diagrams from an average persistence landscape. It follows that the persistence landscape kernel is characteristic for certain generic empirical measures. Finally, the persistence landscape distance may be arbitrarily small compared to the interleaving distance.Comment: 18 pages, to appear in the Proceedings of the 2018 Abel Symposiu

    Multiple testing with persistent homology

    Full text link
    Multiple hypothesis testing requires a control procedure. Simply increasing simulations or permutations to meet a Bonferroni-style threshold is prohibitively expensive. In this paper we propose a null model based approach to testing for acyclicity, coupled with a Family-Wise Error Rate (FWER) control method that does not suffer from these computational costs. We adapt an False Discovery Rate (FDR) control approach to the topological setting, and show it to be compatible both with our null model approach and with previous approaches to hypothesis testing in persistent homology. By extending a limit theorem for persistent homology on samples from point processes, we provide theoretical validation for our FWER and FDR control methods

    Statistical topological data analysis using persistence landscapes

    Full text link
    We define a new topological summary for data that we call the persistence landscape. Since this summary lies in a vector space, it is easy to combine with tools from statistics and machine learning, in contrast to the standard topological summaries. Viewed as a random variable with values in a Banach space, this summary obeys a strong law of large numbers and a central limit theorem. We show how a number of standard statistical tests can be used for statistical inference using this summary. We also prove that this summary is stable and that it can be used to provide lower bounds for the bottleneck and Wasserstein distances.Comment: 26 pages, final version, to appear in Journal of Machine Learning Research, includes two additional examples not in the journal version: random geometric complexes and Erdos-Renyi random clique complexe
    • …
    corecore