148 research outputs found

    Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

    Full text link
    We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like tf-idf\textit{tf-idf}, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.Comment: 5 pages, 3 figures. Rep4NLP workshop at ACL 201

    Optimal rates of convergence for persistence diagrams in Topological Data Analysis

    Full text link
    Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general metric spaces, with a statistical approach. We show that the use of persistent homology can be naturally considered in general statistical frameworks and persistence diagrams can be used as statistics with interesting convergence properties. Some numerical experiments are performed in various contexts to illustrate our results

    Persistence stability for geometric complexes

    Full text link
    In this paper we study the properties of the homology of different geometric filtered complexes (such as Vietoris-Rips, Cech and witness complexes) built on top of precompact spaces. Using recent developments in the theory of topological persistence we provide simple and natural proofs of the stability of the persistent homology of such complexes with respect to the Gromov--Hausdorff distance. We also exhibit a few noteworthy properties of the homology of the Rips and Cech complexes built on top of compact spaces.Comment: We include a discussion of ambient Cech complexes and a new class of examples called Dowker complexe

    Subsampling Methods for Persistent Homology

    Full text link
    Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets. When the size of the sample is large, direct computation of the persistent homology is prohibitive due to the combinatorial nature of the existing algorithms. We propose to compute the persistent homology of several subsamples of the data and then combine the resulting estimates. We study the risk of two estimators and we prove that the subsampling approach carries stable topological information while achieving a great reduction in computational complexity

    Curvature Sets Over Persistence Diagrams

    Full text link
    We study an invariant of compact metric spaces which combines the notion of curvature sets introduced by Gromov in the 1980s together with the notion of Vietoris-Rips persistent homology. For given integers k≥0k\geq 0 and n≥1n\geq 1 these invariants arise by considering the degree kk Vietoris-Rips persistence diagrams of all subsets of a given metric space with cardinality at most nn. We call these invariants \emph{persistence sets} and denote them as Dn,kVRD_{n,k}^\mathrm{VR}. We argue that computing these invariants could be significantly easier than computing the usual Vietoris-Rips persistence diagrams. We establish stability results as for these invariants and we also precisely characterize some of them in the case of spheres with geodesic and Euclidean distances. We identify a rich family of metric graphs for which D4,1VRD_{4,1}^{\mathrm{VR}} fully recovers their homotopy type. Along the way we prove some useful properties of Vietoris-Rips persistence diagrams
    • …
    corecore