113 research outputs found
Sliced Wasserstein Kernel for Persistence Diagrams
Persistence diagrams (PDs) play a key role in topological data analysis
(TDA), in which they are routinely used to describe topological properties of
complicated shapes. PDs enjoy strong stability properties and have proven their
utility in various learning contexts. They do not, however, live in a space
naturally endowed with a Hilbert structure and are usually compared with
specific distances, such as the bottleneck distance. To incorporate PDs in a
learning pipeline, several kernels have been proposed for PDs with a strong
emphasis on the stability of the RKHS distance w.r.t. perturbations of the PDs.
In this article, we use the Sliced Wasserstein approximation SW of the
Wasserstein distance to define a new kernel for PDs, which is not only provably
stable but also provably discriminative (depending on the number of points in
the PDs) w.r.t. the Wasserstein distance between PDs. We also demonstrate
its practicality, by developing an approximation technique to reduce kernel
computation time, and show that our proposal compares favorably to existing
kernels for PDs on several benchmarks.Comment: Minor modification
Persistence Bag-of-Words for Topological Data Analysis
Persistent homology (PH) is a rigorous mathematical theory that provides a
robust descriptor of data in the form of persistence diagrams (PDs). PDs
exhibit, however, complex structure and are difficult to integrate in today's
machine learning workflows. This paper introduces persistence bag-of-words: a
novel and stable vectorized representation of PDs that enables the seamless
integration with machine learning. Comprehensive experiments show that the new
representation achieves state-of-the-art performance and beyond in much less
time than alternative approaches.Comment: Accepted for the Twenty-Eight International Joint Conference on
Artificial Intelligence (IJCAI-19). arXiv admin note: substantial text
overlap with arXiv:1802.0485
On the Metric Distortion of Embedding Persistence Diagrams into Separable Hilbert Spaces
Persistence diagrams are important descriptors in Topological Data Analysis. Due to the nonlinearity of the space of persistence diagrams equipped with their diagram distances, most of the recent attempts at using persistence diagrams in machine learning have been done through kernel methods, i.e., embeddings of persistence diagrams into Reproducing Kernel Hilbert Spaces, in which all computations can be performed easily. Since persistence diagrams enjoy theoretical stability guarantees for the diagram distances, the metric properties of the feature map, i.e., the relationship between the Hilbert distance and the diagram distances, are of central interest for understanding if the persistence diagram guarantees carry over to the embedding. In this article, we study the possibility of embedding persistence diagrams into separable Hilbert spaces with bi-Lipschitz maps. In particular, we show that for several stable embeddings into infinite-dimensional Hilbert spaces defined in the literature, any lower bound must depend on the cardinalities of the persistence diagrams, and that when the Hilbert space is finite dimensional, finding a bi-Lipschitz embedding is impossible, even when restricting the persistence diagrams to have bounded cardinalities
Persistent Homology Based Characterization of the Breast Cancer Immune Microenvironment: A Feasibility Study
Persistent homology is a common tool of topological data analysis, whose main descriptor, the persistence diagram, aims at computing and encoding the geometry and topology of given datasets. In this article, we present a novel application of persistent homology to characterize the spatial arrangement of immune and epithelial (tumor) cells within the breast cancer immune microenvironment. More specifically, quantitative and robust characterizations are built by computing persistence diagrams out of a staining technique (quantitative multiplex immunofluorescence) which allows us to obtain spatial coordinates and stain intensities on individual cells. The resulting persistence diagrams are evaluated as characteristic biomarkers of cancer subtype and prognostic biomarker of overall survival. For a cohort of approximately 700 breast cancer patients with median 8.5-year clinical follow-up, we show that these persistence diagrams outperform and complement the usual descriptors which capture spatial relationships with nearest neighbor analysis. This provides new insights and possibilities on the general problem of building (topology-based) biomarkers that are characteristic and predictive of cancer subtype, overall survival and response to therapy
- …