538 research outputs found

    Graded persistence diagrams and persistence landscapes

    Full text link
    We introduce a refinement of the persistence diagram, the graded persistence diagram. It is the Mobius inversion of the graded rank function, which is obtained from the rank function using the unary numeral system. Both persistence diagrams and graded persistence diagrams are integer-valued functions on the Cartesian plane. Whereas the persistence diagram takes non-negative values, the graded persistence diagram takes values of 0, 1, or -1. The sum of the graded persistence diagrams is the persistence diagram. We show that the positive and negative points in the k-th graded persistence diagram correspond to the local maxima and minima, respectively, of the k-th persistence landscape. We prove a stability theorem for graded persistence diagrams: the 1-Wasserstein distance between k-th graded persistence diagrams is bounded by twice the 1-Wasserstein distance between the corresponding persistence diagrams, and this bound is attained. In the other direction, the 1-Wasserstein distance is a lower bound for the sum of the 1-Wasserstein distances between the k-th graded persistence diagrams. In fact, the 1-Wasserstein distance for graded persistence diagrams is more discriminative than the 1-Wasserstein distance for the corresponding persistence diagrams.Comment: accepted for publication in Discrete and Computational Geometr

    Multiscale topology classifies and quantifies cell types in subcellular spatial transcriptomics

    Full text link
    Spatial transcriptomics has the potential to transform our understanding of RNA expression in tissues. Classical array-based technologies produce multiple-cell-scale measurements requiring deconvolution to recover single cell information. However, rapid advances in subcellular measurement of RNA expression at whole-transcriptome depth necessitate a fundamentally different approach. To integrate single-cell RNA-seq data with nanoscale spatial transcriptomics, we present a topological method for automatic cell type identification (TopACT). Unlike popular decomposition approaches to multicellular resolution data, TopACT is able to pinpoint the spatial locations of individual sparsely dispersed cells without prior knowledge of cell boundaries. Pairing TopACT with multiparameter persistent homology landscapes predicts immune cells forming a peripheral ring structure within kidney glomeruli in a murine model of lupus nephritis, which we experimentally validate with immunofluorescent imaging. The proposed topological data analysis unifies multiple biological scales, from subcellular gene expression to multicellular tissue organization.Comment: Main text: 8 pages, 4 figures. Supplement: 12 pages, 5 figure

    Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures

    Full text link
    Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.Comment: 23 pages, 3 figures, 8 table

    ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

    Full text link
    In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).Comment: NeurIPS, 2022 (36th Conference on Neural Information Processing Systems

    Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems

    Full text link
    Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Characteristics Curves, for one parameter filtrations and Euler Characteristic Profiles, for multi-parameter filtrations. While being a weaker invariant in one dimension, we show that Euler Characteristic based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations and practical applicability for big data problems. In addition we show that the Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis. Lastly, to show their practical applicability, multiple use-cases are considered.Comment: 32 pages, 19 figures. Added remark on multicritical filtrations in section 4, typos correcte

    Multiparameter Persistence Images for Topological Machine Learning

    Get PDF
    International audienceIn the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks

    Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis

    Get PDF
    The basic goal of topological data analysis is to apply topology-based descriptors to understand and describe the shape of data. In this context, homology is one of the most relevant topological descriptors, well-appreciated for its discrete nature, computability and dimension independence. A further development is provided by persistent homology, which allows to track homological features along a oneparameter increasing sequence of spaces. Multiparameter persistent homology, also called multipersistent homology, is an extension of the theory of persistent homology motivated by the need of analyzing data naturally described by several parameters, such as vector-valued functions. Multipersistent homology presents several issues in terms of feasibility of computations over real-sized data and theoretical challenges in the evaluation of possible descriptors. The focus of this thesis is in the interplay between persistent homology theory and discrete Morse Theory. Discrete Morse theory provides methods for reducing the computational cost of homology and persistent homology by considering the discrete Morse complex generated by the discrete Morse gradient in place of the original complex. The work of this thesis addresses the problem of computing multipersistent homology, to make such tool usable in real application domains. This requires both computational optimizations towards the applications to real-world data, and theoretical insights for finding and interpreting suitable descriptors. Our computational contribution consists in proposing a new Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility of our preprocessing over real datasets, and evaluate the impact of the proposed algorithm as a preprocessing for computing multipersistent homology. A theoretical contribution of this thesis consists in proposing a new notion of optimality for such a preprocessing in the multiparameter context. We show that the proposed notion generalizes an already known optimality notion from the one-parameter case. Under this definition, we show that the algorithm we propose as a preprocessing is optimal in low dimensional domains. In the last part of the thesis, we consider preliminary applications of the proposed algorithm in the context of topology-based multivariate visualization by tracking critical features generated by a discrete gradient field compatible with the multiple scalar fields under study. We discuss (dis)similarities of such critical features with the state-of-the-art techniques in topology-based multivariate data visualization

    Delaunay Bifiltrations of Functions on Point Clouds

    Full text link
    The Delaunay filtration D(X)\mathcal{D}_{\bullet}(X) of a point cloud XRdX\subset \mathbb{R}^d is a central tool of computational topology. Its use is justified by the topological equivalence of D(X)\mathcal{D}_{\bullet}(X) and the offset (i.e., union-of-balls) filtration of XX. Given a function γ:XR\gamma: X \to \mathbb{R}, we introduce a Delaunay bifiltration DC(γ)\mathcal{DC}_{\bullet}(\gamma) that satisfies an analogous topological equivalence, ensuring that DC(γ)\mathcal{DC}_{\bullet}(\gamma) topologically encodes the offset filtrations of all sublevel sets of γ\gamma, as well as the topological relations between them. DC(γ)\mathcal{DC}_{\bullet}(\gamma) is of size O(Xd+12)O(|X|^{\lceil\frac{d+1}{2}\rceil}), which for dd odd matches the worst-case size of D(X)\mathcal{D}_{\bullet}(X). Adapting the Bowyer-Watson algorithm for computing Delaunay triangulations, we give a simple, practical algorithm to compute DC(γ)\mathcal{DC}_{\bullet}(\gamma) in time O(Xd2+1)O(|X|^{\lceil \frac{d}{2}\rceil +1}). Our implementation, based on CGAL, computes DC(γ)\mathcal{DC}_{\bullet}(\gamma) with modest overhead compared to computing D(X)\mathcal{D}_{\bullet}(X), and handles tens of thousands of points in R3\mathbb{R}^3 within seconds.Comment: 28 pages, 7 figures, 8 tables. To appear in the proceedings of SODA2

    Euler Characteristic Tools For Topological Data Analysis

    Full text link
    In this article, we study Euler characteristic techniques in topological data analysis. Pointwise computing the Euler characteristic of a family of simplicial complexes built from data gives rise to the so-called Euler characteristic profile. We show that this simple descriptor achieve state-of-the-art performance in supervised tasks at a very low computational cost. Inspired by signal analysis, we compute hybrid transforms of Euler characteristic profiles. These integral transforms mix Euler characteristic techniques with Lebesgue integration to provide highly efficient compressors of topological signals. As a consequence, they show remarkable performances in unsupervised settings. On the qualitative side, we provide numerous heuristics on the topological and geometric information captured by Euler profiles and their hybrid transforms. Finally, we prove stability results for these descriptors as well as asymptotic guarantees in random settings.Comment: 39 page
    corecore