726 research outputs found

    Probabilistic Convergence and Stability of Random Mapper Graphs

    Get PDF
    We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space X\mathbb{X} equipped with a continuous function f:X→Rf: \mathbb{X} \rightarrow \mathbb{R}. We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line R\mathbb{R}. We then introduce a variant of the classic mapper graph of Singh et al.~(2007), referred to as the enhanced mapper graph, and demonstrate that such a construction approximates the Reeb graph of (X,f)(\mathbb{X}, f) when it is applied to points randomly sampled from a probability density function concentrated on (X,f)(\mathbb{X}, f). Our techniques are based on the interleaving distance of constructible cosheaves and topological estimation via kernel density estimates. Following Munch and Wang (2018), we first show that the mapper graph of (X,f)(\mathbb{X}, f), a constructible R\mathbb{R}-space (with a fixed open cover), approximates the Reeb graph of the same space. We then construct an isomorphism between the mapper of (X,f)(\mathbb{X},f) to the mapper of a super-level set of a probability density function concentrated on (X,f)(\mathbb{X}, f). Finally, building on the approach of Bobrowski et al.~(2017), we show that, with high probability, we can recover the mapper of the super-level set given a sufficiently large sample. Our work is the first to consider the mapper construction using the theory of cosheaves in a probabilistic setting. It is part of an ongoing effort to combine sheaf theory, probability, and statistics, to support topological data analysis with random data

    Statistical analysis of Mapper for stochastic and multivariate filters

    Full text link
    Reeb spaces, as well as their discretized versions called Mappers, are common descriptors used in Topological Data Analysis, with plenty of applications in various fields of science, such as computational biology and data visualization, among others. The stability and quantification of the rate of convergence of the Mapper to the Reeb space has been studied a lot in recent works [BBMW19, CO17, CMO18, MW16], focusing on the case where a scalar-valued filter is used for the computation of Mapper. On the other hand, much less is known in the multivariate case, when the codomain of the filter is Rp\mathbb{R}^p, and in the general case, when it is a general metric space (Z,dZ)(Z, d_Z), instead of R\mathbb{R}. The few results that are available in this setting [DMW17, MW16] can only handle continuous topological spaces and cannot be used as is for finite metric spaces representing data, such as point clouds and distance matrices. In this article, we introduce a slight modification of the usual Mapper construction and we give risk bounds for estimating the Reeb space using this estimator. Our approach applies in particular to the setting where the filter function used to compute Mapper is also estimated from data, such as the eigenfunctions of PCA. Our results are given with respect to the Gromov-Hausdorff distance, computed with specific filter-based pseudometrics for Mappers and Reeb spaces defined in [DMW17]. We finally provide applications of this setting in statistics and machine learning for different kinds of target filters, as well as numerical experiments that demonstrate the relevance of our approac

    Density Evolution for Asymmetric Memoryless Channels

    Full text link
    Density evolution is one of the most powerful analytical tools for low-density parity-check (LDPC) codes and graph codes with message passing decoding algorithms. With channel symmetry as one of its fundamental assumptions, density evolution (DE) has been widely and successfully applied to different channels, including binary erasure channels, binary symmetric channels, binary additive white Gaussian noise channels, etc. This paper generalizes density evolution for non-symmetric memoryless channels, which in turn broadens the applications to general memoryless channels, e.g. z-channels, composite white Gaussian noise channels, etc. The central theorem underpinning this generalization is the convergence to perfect projection for any fixed size supporting tree. A new iterative formula of the same complexity is then presented and the necessary theorems for the performance concentration theorems are developed. Several properties of the new density evolution method are explored, including stability results for general asymmetric memoryless channels. Simulations, code optimizations, and possible new applications suggested by this new density evolution method are also provided. This result is also used to prove the typicality of linear LDPC codes among the coset code ensemble when the minimum check node degree is sufficiently large. It is shown that the convergence to perfect projection is essential to the belief propagation algorithm even when only symmetric channels are considered. Hence the proof of the convergence to perfect projection serves also as a completion of the theory of classical density evolution for symmetric memoryless channels.Comment: To appear in the IEEE Transactions on Information Theor

    Mapper on Graphs for Network Visualization

    Full text link
    Networks are an exceedingly popular type of data for representing relationships between individuals, businesses, proteins, brain regions, telecommunication endpoints, etc. Network or graph visualization provides an intuitive way to explore the node-link structures of network data for instant sense-making. However, naive node-link diagrams can fail to convey insights regarding network structures, even for moderately sized data of a few hundred nodes. We propose to apply the mapper construction--a popular tool in topological data analysis--to graph visualization, which provides a strong theoretical basis for summarizing network data while preserving their core structures. We develop a variation of the mapper construction targeting weighted, undirected graphs, called mapper on graphs, which generates property-preserving summaries of graphs. We provide a software tool that enables interactive explorations of such summaries and demonstrates the effectiveness of our method for synthetic and real-world data. The mapper on graphs approach we propose represents a new class of techniques that leverages tools from topological data analysis in addressing challenges in graph visualization

    Topological data analysis of organoids

    Get PDF
    Organoids are multi-cellular structures which are cultured in vitro from stem cells to resemble specific organs (e.g., colon, liver) in their three- dimensional composition. The gene expression and the tissue composition of organoids constantly affect each other. Dynamic changes in the shape, cellular composition and transcriptomic profile of these model systems can be used to understand the effect of mutations and treatments in health and disease. In this thesis, I propose new techniques in the field of topological data analysis (TDA) to analyse the gene expression and the morphology of organoids. I use TDA methods, which are inspired by topology, to analyse and quantify the continuous structure of single-cell RNA sequencing data, which is embedded in high dimensional space, and the shape of an organoid. For single-cell RNA sequencing data, I developed the multiscale Laplacian score (MLS) and the UMAP diffusion cover, which both extend and im- prove existing topological analysis methods. I demonstrate the utility of these techniques by applying them to a published benchmark single-cell data set and a data set of mouse colon organoids. The methods validate previously identified genes and detect additional genes with known involvement cancers. To study the morphology of organoids I propose DETECT, a rotationally invariant signature of dynamically changing shapes. I demonstrate the efficacy of this method on a data set of segmented videos of mouse small intestine organoid experiments and show that it outperforms classical shape descriptors. I verify the method on a synthetic organoid data set and illustrate how it generalises to 3D to conclude that DETECT offers rigorous quantification of organoids and opens up computationally scalable methods for distinguishing different growth regimes and assessing treatment effects. Finally, I make a theoretical contribution to the statistical inference of the method underlying DETECT

    Visualization of AE's Training on Credit Card Transactions with Persistent Homology

    Get PDF
    Auto-encoders are among the most popular neural network architecture for dimension reduction. They are composed of two parts: the encoder which maps the model distribution to a latent manifold and the decoder which maps the latent manifold to a reconstructed distribution. However, auto-encoders are known to provoke chaotically scattered data distribution in the latent manifold resulting in an incomplete reconstructed distribution. Current distance measures fail to detect this problem because they are not able to acknowledge the shape of the data manifolds, i.e. their topological features, and the scale at which the manifolds should be analyzed. We propose Persistent Homology for Wasserstein Auto-Encoders, called PHom-WAE, a new methodology to assess and measure the data distribution of a generative model. PHom-WAE minimizes the Wasserstein distance between the true distribution and the reconstructed distribution and uses persistent homology, the study of the topological features of a space at different spatial resolutions, to compare the nature of the latent manifold and the reconstructed distribution. Our experiments underline the potential of persistent homology for Wasserstein Auto-Encoders in comparison to Variational Auto-Encoders, another type of generative model. The experiments are conducted on a real-world data set particularly challenging for traditional distance measures and auto-encoders. PHom-WAE is the first methodology to propose a topological distance measure, the bottleneck distance, for Wasserstein Auto-Encoders used to compare decoded samples of high quality in the context of credit card transactions.Comment: arXiv admin note: substantial text overlap with arXiv:1905.0989
    • …
    corecore