23 research outputs found
Hotspot identification for Mapper graphs
Mapper algorithm can be used to build graph-based representations of
high-dimensional data capturing structurally interesting features such as
loops, flares or clusters. The graph can be further annotated with additional
colouring of vertices allowing location of regions of special interest. For
instance, in many applications, such as precision medicine, Mapper graph has
been used to identify unknown compactly localized subareas within the dataset
demonstrating unique or unusual behaviours. This task, performed so far by a
researcher, can be automatized using hotspot analysis. In this work we propose
a new algorithm for detecting hotspots in Mapper graphs. It allows automatizing
of the hotspot detection process. We demonstrate the performance of the
algorithm on a number of artificial and real world datasets. We further
demonstrate how our algorithm can be used for the automatic selection of the
Mapper lens functions.Comment: Topological Data Analysis and Beyond Workshop at the 34th Conference
on Neural Information Processing Systems (NeurIPS 2020
Sheaf-Theoretic Stratification Learning from Geometric and Topological Perspectives
In this paper, we investigate a sheaf-theoretic interpretation of
stratification learning from geometric and topological perspectives. Our main
result is the construction of stratification learning algorithms framed in
terms of a sheaf on a partially ordered set with the Alexandroff topology. We
prove that the resulting decomposition is the unique minimal stratification for
which the strata are homogeneous and the given sheaf is constructible. In
particular, when we choose to work with the local homology sheaf, our algorithm
gives an alternative to the local homology transfer algorithm given in Bendich
et al. (2012), and the cohomology stratification algorithm given in Nanda
(2017). Additionally, we give examples of stratifications based on the
geometric techniques of Breiding et al. (2018), illustrating how the
sheaf-theoretic approach can be used to study stratifications from both
topological and geometric perspectives. This approach also points toward future
applications of sheaf theory in the study of topological data analysis by
illustrating the utility of the language of sheaf theory in generalizing
existing algorithms
A fast approximate skeleton with guarantees for any cloud of points in a Euclidean space
The tree reconstruction problem is to find an embedded straight-line tree that approximates a given cloud of unorganized points in up to a certain error. A practical solution to this problem will accelerate a discovery of new colloidal products with desired physical properties such as viscosity. We define the Approximate Skeleton of any finite point cloud in a Euclidean space with theoretical guarantees. The Approximate Skeleton ASk always belongs to a given offset of , i.e. the maximum distance from to ASk can be a given maximum error. The number of vertices in the Approximate Skeleton is close to the minimum number in an optimal tree by factor 2. The new Approximate Skeleton of any unorganized point cloud is computed in a near linear time in the number of points in . Finally, the Approximate Skeleton outperforms past skeletonization algorithms on the size and accuracy of reconstruction for a large dataset of real micelles and random clouds
Mapper on Graphs for Network Visualization
Networks are an exceedingly popular type of data for representing
relationships between individuals, businesses, proteins, brain regions,
telecommunication endpoints, etc. Network or graph visualization provides an
intuitive way to explore the node-link structures of network data for instant
sense-making. However, naive node-link diagrams can fail to convey insights
regarding network structures, even for moderately sized data of a few hundred
nodes. We propose to apply the mapper construction--a popular tool in
topological data analysis--to graph visualization, which provides a strong
theoretical basis for summarizing network data while preserving their core
structures. We develop a variation of the mapper construction targeting
weighted, undirected graphs, called mapper on graphs, which generates
property-preserving summaries of graphs. We provide a software tool that
enables interactive explorations of such summaries and demonstrates the
effectiveness of our method for synthetic and real-world data. The mapper on
graphs approach we propose represents a new class of techniques that leverages
tools from topological data analysis in addressing challenges in graph
visualization
Statistical analysis of Mapper for stochastic and multivariate filters
Reeb spaces, as well as their discretized versions called Mappers, are common
descriptors used in Topological Data Analysis, with plenty of applications in
various fields of science, such as computational biology and data
visualization, among others. The stability and quantification of the rate of
convergence of the Mapper to the Reeb space has been studied a lot in recent
works [BBMW19, CO17, CMO18, MW16], focusing on the case where a scalar-valued
filter is used for the computation of Mapper. On the other hand, much less is
known in the multivariate case, when the codomain of the filter is
, and in the general case, when it is a general metric space , instead of . The few results that are available in this
setting [DMW17, MW16] can only handle continuous topological spaces and cannot
be used as is for finite metric spaces representing data, such as point clouds
and distance matrices. In this article, we introduce a slight modification of
the usual Mapper construction and we give risk bounds for estimating the Reeb
space using this estimator. Our approach applies in particular to the setting
where the filter function used to compute Mapper is also estimated from data,
such as the eigenfunctions of PCA. Our results are given with respect to the
Gromov-Hausdorff distance, computed with specific filter-based pseudometrics
for Mappers and Reeb spaces defined in [DMW17]. We finally provide applications
of this setting in statistics and machine learning for different kinds of
target filters, as well as numerical experiments that demonstrate the relevance
of our approac