726 research outputs found
Probabilistic Convergence and Stability of Random Mapper Graphs
We study the probabilistic convergence between the mapper graph and the Reeb
graph of a topological space equipped with a continuous function
. We first give a categorification of the
mapper graph and the Reeb graph by interpreting them in terms of cosheaves and
stratified covers of the real line . We then introduce a variant of
the classic mapper graph of Singh et al.~(2007), referred to as the enhanced
mapper graph, and demonstrate that such a construction approximates the Reeb
graph of when it is applied to points randomly sampled from a
probability density function concentrated on .
Our techniques are based on the interleaving distance of constructible
cosheaves and topological estimation via kernel density estimates. Following
Munch and Wang (2018), we first show that the mapper graph of , a constructible -space (with a fixed open cover), approximates
the Reeb graph of the same space. We then construct an isomorphism between the
mapper of to the mapper of a super-level set of a probability
density function concentrated on . Finally, building on the
approach of Bobrowski et al.~(2017), we show that, with high probability, we
can recover the mapper of the super-level set given a sufficiently large
sample. Our work is the first to consider the mapper construction using the
theory of cosheaves in a probabilistic setting. It is part of an ongoing effort
to combine sheaf theory, probability, and statistics, to support topological
data analysis with random data
Statistical analysis of Mapper for stochastic and multivariate filters
Reeb spaces, as well as their discretized versions called Mappers, are common
descriptors used in Topological Data Analysis, with plenty of applications in
various fields of science, such as computational biology and data
visualization, among others. The stability and quantification of the rate of
convergence of the Mapper to the Reeb space has been studied a lot in recent
works [BBMW19, CO17, CMO18, MW16], focusing on the case where a scalar-valued
filter is used for the computation of Mapper. On the other hand, much less is
known in the multivariate case, when the codomain of the filter is
, and in the general case, when it is a general metric space , instead of . The few results that are available in this
setting [DMW17, MW16] can only handle continuous topological spaces and cannot
be used as is for finite metric spaces representing data, such as point clouds
and distance matrices. In this article, we introduce a slight modification of
the usual Mapper construction and we give risk bounds for estimating the Reeb
space using this estimator. Our approach applies in particular to the setting
where the filter function used to compute Mapper is also estimated from data,
such as the eigenfunctions of PCA. Our results are given with respect to the
Gromov-Hausdorff distance, computed with specific filter-based pseudometrics
for Mappers and Reeb spaces defined in [DMW17]. We finally provide applications
of this setting in statistics and machine learning for different kinds of
target filters, as well as numerical experiments that demonstrate the relevance
of our approac
Density Evolution for Asymmetric Memoryless Channels
Density evolution is one of the most powerful analytical tools for
low-density parity-check (LDPC) codes and graph codes with message passing
decoding algorithms. With channel symmetry as one of its fundamental
assumptions, density evolution (DE) has been widely and successfully applied to
different channels, including binary erasure channels, binary symmetric
channels, binary additive white Gaussian noise channels, etc. This paper
generalizes density evolution for non-symmetric memoryless channels, which in
turn broadens the applications to general memoryless channels, e.g. z-channels,
composite white Gaussian noise channels, etc. The central theorem underpinning
this generalization is the convergence to perfect projection for any fixed size
supporting tree. A new iterative formula of the same complexity is then
presented and the necessary theorems for the performance concentration theorems
are developed. Several properties of the new density evolution method are
explored, including stability results for general asymmetric memoryless
channels. Simulations, code optimizations, and possible new applications
suggested by this new density evolution method are also provided. This result
is also used to prove the typicality of linear LDPC codes among the coset code
ensemble when the minimum check node degree is sufficiently large. It is shown
that the convergence to perfect projection is essential to the belief
propagation algorithm even when only symmetric channels are considered. Hence
the proof of the convergence to perfect projection serves also as a completion
of the theory of classical density evolution for symmetric memoryless channels.Comment: To appear in the IEEE Transactions on Information Theor
Mapper on Graphs for Network Visualization
Networks are an exceedingly popular type of data for representing
relationships between individuals, businesses, proteins, brain regions,
telecommunication endpoints, etc. Network or graph visualization provides an
intuitive way to explore the node-link structures of network data for instant
sense-making. However, naive node-link diagrams can fail to convey insights
regarding network structures, even for moderately sized data of a few hundred
nodes. We propose to apply the mapper construction--a popular tool in
topological data analysis--to graph visualization, which provides a strong
theoretical basis for summarizing network data while preserving their core
structures. We develop a variation of the mapper construction targeting
weighted, undirected graphs, called mapper on graphs, which generates
property-preserving summaries of graphs. We provide a software tool that
enables interactive explorations of such summaries and demonstrates the
effectiveness of our method for synthetic and real-world data. The mapper on
graphs approach we propose represents a new class of techniques that leverages
tools from topological data analysis in addressing challenges in graph
visualization
Topological data analysis of organoids
Organoids are multi-cellular structures which are cultured in vitro from stem cells to resemble specific organs (e.g., colon, liver) in their three- dimensional composition. The gene expression and the tissue composition of organoids constantly affect each other. Dynamic changes in the shape, cellular composition and transcriptomic profile of these model systems can be used to understand the effect of mutations and treatments in health and disease. In this thesis, I propose new techniques in the field of topological data analysis (TDA) to analyse the gene expression and the morphology of organoids. I use TDA methods, which are inspired by topology, to analyse and quantify the continuous structure of single-cell RNA sequencing data, which is embedded in high dimensional space, and the shape of an organoid.
For single-cell RNA sequencing data, I developed the multiscale Laplacian score (MLS) and the UMAP diffusion cover, which both extend and im- prove existing topological analysis methods. I demonstrate the utility of these techniques by applying them to a published benchmark single-cell data set and a data set of mouse colon organoids. The methods validate previously identified genes and detect additional genes with known involvement cancers.
To study the morphology of organoids I propose DETECT, a rotationally invariant signature of dynamically changing shapes. I demonstrate the efficacy of this method on a data set of segmented videos of mouse
small intestine organoid experiments and show that it outperforms classical shape descriptors. I verify the method on a synthetic organoid data set and illustrate how it generalises to 3D to conclude that DETECT offers rigorous quantification of organoids and opens up computationally scalable methods for distinguishing different growth regimes and assessing treatment effects. Finally, I make a theoretical contribution to the statistical inference of the method underlying DETECT
Visualization of AE's Training on Credit Card Transactions with Persistent Homology
Auto-encoders are among the most popular neural network architecture for
dimension reduction. They are composed of two parts: the encoder which maps the
model distribution to a latent manifold and the decoder which maps the latent
manifold to a reconstructed distribution. However, auto-encoders are known to
provoke chaotically scattered data distribution in the latent manifold
resulting in an incomplete reconstructed distribution. Current distance
measures fail to detect this problem because they are not able to acknowledge
the shape of the data manifolds, i.e. their topological features, and the scale
at which the manifolds should be analyzed. We propose Persistent Homology for
Wasserstein Auto-Encoders, called PHom-WAE, a new methodology to assess and
measure the data distribution of a generative model. PHom-WAE minimizes the
Wasserstein distance between the true distribution and the reconstructed
distribution and uses persistent homology, the study of the topological
features of a space at different spatial resolutions, to compare the nature of
the latent manifold and the reconstructed distribution. Our experiments
underline the potential of persistent homology for Wasserstein Auto-Encoders in
comparison to Variational Auto-Encoders, another type of generative model. The
experiments are conducted on a real-world data set particularly challenging for
traditional distance measures and auto-encoders. PHom-WAE is the first
methodology to propose a topological distance measure, the bottleneck distance,
for Wasserstein Auto-Encoders used to compare decoded samples of high quality
in the context of credit card transactions.Comment: arXiv admin note: substantial text overlap with arXiv:1905.0989
- …