148 research outputs found
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations
We investigate the pertinence of methods from algebraic topology for text
data analysis. These methods enable the development of
mathematically-principled isometric-invariant mappings from a set of vectors to
a document embedding, which is stable with respect to the geometry of the
document in the selected metric space. In this work, we evaluate the utility of
these topology-based document representations in traditional NLP tasks,
specifically document clustering and sentiment classification. We find that the
embeddings do not benefit text analysis. In fact, performance is worse than
simple techniques like , indicating that the geometry of the
document does not provide enough variability for classification on the basis of
topic or sentiment in the chosen datasets.Comment: 5 pages, 3 figures. Rep4NLP workshop at ACL 201
Optimal rates of convergence for persistence diagrams in Topological Data Analysis
Computational topology has recently known an important development toward
data analysis, giving birth to the field of topological data analysis.
Topological persistence, or persistent homology, appears as a fundamental tool
in this field. In this paper, we study topological persistence in general
metric spaces, with a statistical approach. We show that the use of persistent
homology can be naturally considered in general statistical frameworks and
persistence diagrams can be used as statistics with interesting convergence
properties. Some numerical experiments are performed in various contexts to
illustrate our results
Persistence stability for geometric complexes
In this paper we study the properties of the homology of different geometric
filtered complexes (such as Vietoris-Rips, Cech and witness complexes) built on
top of precompact spaces. Using recent developments in the theory of
topological persistence we provide simple and natural proofs of the stability
of the persistent homology of such complexes with respect to the
Gromov--Hausdorff distance. We also exhibit a few noteworthy properties of the
homology of the Rips and Cech complexes built on top of compact spaces.Comment: We include a discussion of ambient Cech complexes and a new class of
examples called Dowker complexe
Subsampling Methods for Persistent Homology
Persistent homology is a multiscale method for analyzing the shape of sets
and functions from point cloud data arising from an unknown distribution
supported on those sets. When the size of the sample is large, direct
computation of the persistent homology is prohibitive due to the combinatorial
nature of the existing algorithms. We propose to compute the persistent
homology of several subsamples of the data and then combine the resulting
estimates. We study the risk of two estimators and we prove that the
subsampling approach carries stable topological information while achieving a
great reduction in computational complexity
Curvature Sets Over Persistence Diagrams
We study an invariant of compact metric spaces which combines the notion of
curvature sets introduced by Gromov in the 1980s together with the notion of
Vietoris-Rips persistent homology. For given integers and
these invariants arise by considering the degree Vietoris-Rips persistence
diagrams of all subsets of a given metric space with cardinality at most .
We call these invariants \emph{persistence sets} and denote them as
. We argue that computing these invariants could be
significantly easier than computing the usual Vietoris-Rips persistence
diagrams. We establish stability results as for these invariants and we also
precisely characterize some of them in the case of spheres with geodesic and
Euclidean distances. We identify a rich family of metric graphs for which
fully recovers their homotopy type. Along the way we
prove some useful properties of Vietoris-Rips persistence diagrams
- …