39 research outputs found

    Stochastic Convergence of Persistence Landscapes and Silhouettes

    Full text link
    Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram. It is difficult to apply statistical theory directly to a random sample of diagrams. Instead, we can summarize the persistent homology with the persistence landscape, introduced by Bubenik, which converts a diagram into a well-behaved real-valued function. We investigate the statistical properties of landscapes, such as weak convergence of the average landscapes and convergence of the bootstrap. In addition, we introduce an alternate functional summary of persistent homology, which we call the silhouette, and derive an analogous statistical theory

    Topological summaries for Time-Varying Data

    Get PDF
    Topology has proven to be a useful tool in the current quest for ”insights on the data”, since it characterises objects through their connectivity structure, in an easy and interpretable way. More specifically, the new, but growing, field of TDA (Topological Data Analysis) deals with Persistent Homology, a multiscale version of Homology Groups summarized by the Persistence Diagram and its functional representations (Persistence Landscapes, Silhouettes etc). All of these objects, how- ever, are designed and work only for static point clouds. We define a new topological summary, the Landscape Surface, that takes into account the changes in the topology of a dynamical point cloud such as a (possibly very high dimensional) time series. We prove its continuity and its stability and, finally, we sketch a simple example

    Subsampling Methods for Persistent Homology

    Full text link
    Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets. When the size of the sample is large, direct computation of the persistent homology is prohibitive due to the combinatorial nature of the existing algorithms. We propose to compute the persistent homology of several subsamples of the data and then combine the resulting estimates. We study the risk of two estimators and we prove that the subsampling approach carries stable topological information while achieving a great reduction in computational complexity

    The persistence landscape and some of its properties

    Full text link
    Persistence landscapes map persistence diagrams into a function space, which may often be taken to be a Banach space or even a Hilbert space. In the latter case, it is a feature map and there is an associated kernel. The main advantage of this summary is that it allows one to apply tools from statistics and machine learning. Furthermore, the mapping from persistence diagrams to persistence landscapes is stable and invertible. We introduce a weighted version of the persistence landscape and define a one-parameter family of Poisson-weighted persistence landscape kernels that may be useful for learning. We also demonstrate some additional properties of the persistence landscape. First, the persistence landscape may be viewed as a tropical rational function. Second, in many cases it is possible to exactly reconstruct all of the component persistence diagrams from an average persistence landscape. It follows that the persistence landscape kernel is characteristic for certain generic empirical measures. Finally, the persistence landscape distance may be arbitrarily small compared to the interleaving distance.Comment: 18 pages, to appear in the Proceedings of the 2018 Abel Symposiu

    PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures

    Full text link
    Persistence diagrams, the most common descriptors of Topological Data Analysis, encode topological properties of data and have already proved pivotal in many different applications of data science. However, since the (metric) space of persistence diagrams is not Hilbert, they end up being difficult inputs for most Machine Learning techniques. To address this concern, several vectorization methods have been put forward that embed persistence diagrams into either finite-dimensional Euclidean space or (implicit) infinite dimensional Hilbert space with kernels. In this work, we focus on persistence diagrams built on top of graphs. Relying on extended persistence theory and the so-called heat kernel signature, we show how graphs can be encoded by (extended) persistence diagrams in a provably stable way. We then propose a general and versatile framework for learning vectorizations of persistence diagrams, which encompasses most of the vectorization techniques used in the literature. We finally showcase the experimental strength of our setup by achieving competitive scores on classification tasks on real-life graph datasets

    Statistical topological data analysis using persistence landscapes

    Full text link
    We define a new topological summary for data that we call the persistence landscape. Since this summary lies in a vector space, it is easy to combine with tools from statistics and machine learning, in contrast to the standard topological summaries. Viewed as a random variable with values in a Banach space, this summary obeys a strong law of large numbers and a central limit theorem. We show how a number of standard statistical tests can be used for statistical inference using this summary. We also prove that this summary is stable and that it can be used to provide lower bounds for the bottleneck and Wasserstein distances.Comment: 26 pages, final version, to appear in Journal of Machine Learning Research, includes two additional examples not in the journal version: random geometric complexes and Erdos-Renyi random clique complexe
    corecore