Search CORE

148 research outputs found

Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations

Author: Michel Paul
Ravichander Abhilasha
Rijhwani Shruti
Publication venue
Publication date: 01/01/2017
Field of study

We investigate the pertinence of methods from algebraic topology for text data analysis. These methods enable the development of mathematically-principled isometric-invariant mappings from a set of vectors to a document embedding, which is stable with respect to the geometry of the document in the selected metric space. In this work, we evaluate the utility of these topology-based document representations in traditional NLP tasks, specifically document clustering and sentiment classification. We find that the embeddings do not benefit text analysis. In fact, performance is worse than simple techniques like

\textit{tf-idf}

, indicating that the geometry of the document does not provide enough variability for classification on the basis of topic or sentiment in the chosen datasets.Comment: 5 pages, 3 figures. Rep4NLP workshop at ACL 201

arXiv.org e-Print Archive

Crossref

Optimal rates of convergence for persistence diagrams in Topological Data Analysis

Author: Chazal Frédéric
Glisse Marc
Labruère Catherine
Michel Bertrand
Publication venue
Publication date: 27/05/2013
Field of study

Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general metric spaces, with a statistical approach. We show that the use of persistent homology can be naturally considered in general statistical frameworks and persistence diagrams can be used as statistics with interesting convergence properties. Some numerical experiments are performed in various contexts to illustrate our results

arXiv.org e-Print Archive

HAL-uB

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

Persistence stability for geometric complexes

Author: Chazal Frederic
de Silva Vin
Oudot Steve
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we study the properties of the homology of different geometric filtered complexes (such as Vietoris-Rips, Cech and witness complexes) built on top of precompact spaces. Using recent developments in the theory of topological persistence we provide simple and natural proofs of the stability of the persistent homology of such complexes with respect to the Gromov--Hausdorff distance. We also exhibit a few noteworthy properties of the homology of the Rips and Cech complexes built on top of compact spaces.Comment: We include a discussion of ambient Cech complexes and a new class of examples called Dowker complexe

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Subsampling Methods for Persistent Homology

Author: Chazal Frédéric
Fasy Brittany Terese
Lecci Fabrizio
Michel Bertrand
Rinaldo Alessandro
Wasserman Larry
Publication venue
Publication date: 07/06/2014
Field of study

Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets. When the size of the sample is large, direct computation of the persistent homology is prohibitive due to the combinatorial nature of the existing algorithms. We propose to compute the persistent homology of several subsamples of the data and then combine the resulting estimates. We study the risk of two estimators and we prove that the subsampling approach carries stable topological information while achieving a great reduction in computational complexity

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Curvature Sets Over Persistence Diagrams

Author: Gómez Mario
Mémoli Facundo
Publication venue
Publication date: 07/03/2021
Field of study

We study an invariant of compact metric spaces which combines the notion of curvature sets introduced by Gromov in the 1980s together with the notion of Vietoris-Rips persistent homology. For given integers

k\geq 0

and

n\geq 1

these invariants arise by considering the degree

k

Vietoris-Rips persistence diagrams of all subsets of a given metric space with cardinality at most

n

. We call these invariants \emph{persistence sets} and denote them as

D_{n,k}^\mathrm{VR}

. We argue that computing these invariants could be significantly easier than computing the usual Vietoris-Rips persistence diagrams. We establish stability results as for these invariants and we also precisely characterize some of them in the case of spheres with geodesic and Euclidean distances. We identify a rich family of metric graphs for which

D_{4,1}^{\mathrm{VR}}

fully recovers their homotopy type. Along the way we prove some useful properties of Vietoris-Rips persistence diagrams

arXiv.org e-Print Archive