538 research outputs found
Graded persistence diagrams and persistence landscapes
We introduce a refinement of the persistence diagram, the graded persistence
diagram. It is the Mobius inversion of the graded rank function, which is
obtained from the rank function using the unary numeral system. Both
persistence diagrams and graded persistence diagrams are integer-valued
functions on the Cartesian plane. Whereas the persistence diagram takes
non-negative values, the graded persistence diagram takes values of 0, 1, or
-1. The sum of the graded persistence diagrams is the persistence diagram. We
show that the positive and negative points in the k-th graded persistence
diagram correspond to the local maxima and minima, respectively, of the k-th
persistence landscape. We prove a stability theorem for graded persistence
diagrams: the 1-Wasserstein distance between k-th graded persistence diagrams
is bounded by twice the 1-Wasserstein distance between the corresponding
persistence diagrams, and this bound is attained. In the other direction, the
1-Wasserstein distance is a lower bound for the sum of the 1-Wasserstein
distances between the k-th graded persistence diagrams. In fact, the
1-Wasserstein distance for graded persistence diagrams is more discriminative
than the 1-Wasserstein distance for the corresponding persistence diagrams.Comment: accepted for publication in Discrete and Computational Geometr
Multiscale topology classifies and quantifies cell types in subcellular spatial transcriptomics
Spatial transcriptomics has the potential to transform our understanding of
RNA expression in tissues. Classical array-based technologies produce
multiple-cell-scale measurements requiring deconvolution to recover single cell
information. However, rapid advances in subcellular measurement of RNA
expression at whole-transcriptome depth necessitate a fundamentally different
approach. To integrate single-cell RNA-seq data with nanoscale spatial
transcriptomics, we present a topological method for automatic cell type
identification (TopACT). Unlike popular decomposition approaches to
multicellular resolution data, TopACT is able to pinpoint the spatial locations
of individual sparsely dispersed cells without prior knowledge of cell
boundaries. Pairing TopACT with multiparameter persistent homology landscapes
predicts immune cells forming a peripheral ring structure within kidney
glomeruli in a murine model of lupus nephritis, which we experimentally
validate with immunofluorescent imaging. The proposed topological data analysis
unifies multiple biological scales, from subcellular gene expression to
multicellular tissue organization.Comment: Main text: 8 pages, 4 figures. Supplement: 12 pages, 5 figure
Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
Persistent homology (PH) provides topological descriptors for geometric data,
such as weighted graphs, which are interpretable, stable to perturbations, and
invariant under, e.g., relabeling. Most applications of PH focus on the
one-parameter case -- where the descriptors summarize the changes in topology
of data as it is filtered by a single quantity of interest -- and there is now
a wide array of methods enabling the use of one-parameter PH descriptors in
data science, which rely on the stable vectorization of these descriptors as
elements of a Hilbert space. Although the multiparameter PH (MPH) of data that
is filtered by several quantities of interest encodes much richer information
than its one-parameter counterpart, the scarceness of stability results for MPH
descriptors has so far limited the available options for the stable
vectorization of MPH. In this paper, we aim to bring together the best of both
worlds by showing how the interpretation of signed barcodes -- a recent family
of MPH descriptors -- as signed measures leads to natural extensions of
vectorization strategies from one parameter to multiple parameters. The
resulting feature vectors are easy to define and to compute, and provably
stable. While, as a proof of concept, we focus on simple choices of signed
barcodes and vectorizations, we already see notable performance improvements
when comparing our feature vectors to state-of-the-art topology-based methods
on various types of data.Comment: 23 pages, 3 figures, 8 table
ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery
In computer-aided drug discovery (CADD), virtual screening (VS) is used for
identifying the drug candidates that are most likely to bind to a molecular
target in a large library of compounds. Most VS methods to date have focused on
using canonical compound representations (e.g., SMILES strings, Morgan
fingerprints) or generating alternative fingerprints of the compounds by
training progressively more complex variational autoencoders (VAEs) and graph
neural networks (GNNs). Although VAEs and GNNs led to significant improvements
in VS performance, these methods suffer from reduced performance when scaling
to large virtual compound datasets. The performance of these methods has shown
only incremental improvements in the past few years. To address this problem,
we developed a novel method using multiparameter persistence (MP) homology that
produces topological fingerprints of the compounds as multidimensional vectors.
Our primary contribution is framing the VS process as a new topology-based
graph ranking problem by partitioning a compound into chemical substructures
informed by the periodic properties of its atoms and extracting their
persistent homology features at multiple resolution levels. We show that the
margin loss fine-tuning of pretrained Triplet networks attains highly
competitive results in differentiating between compounds in the embedding space
and ranking their likelihood of becoming effective drug candidates. We further
establish theoretical guarantees for the stability properties of our proposed
MP signatures, and demonstrate that our models, enhanced by the MP signatures,
outperform state-of-the-art methods on benchmark datasets by a wide and highly
statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain
for DUD-E Diverse dataset).Comment: NeurIPS, 2022 (36th Conference on Neural Information Processing
Systems
Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems
Tools of Topological Data Analysis provide stable summaries encapsulating the
shape of the considered data. Persistent homology, the most standard and well
studied data summary, suffers a number of limitations; its computations are
hard to distribute, it is hard to generalize to multifiltrations and is
computationally prohibitive for big data-sets. In this paper we study the
concept of Euler Characteristics Curves, for one parameter filtrations and
Euler Characteristic Profiles, for multi-parameter filtrations. While being a
weaker invariant in one dimension, we show that Euler Characteristic based
approaches do not possess some handicaps of persistent homology; we show
efficient algorithms to compute them in a distributed way, their generalization
to multifiltrations and practical applicability for big data problems. In
addition we show that the Euler Curves and Profiles enjoys certain type of
stability which makes them robust tool in data analysis. Lastly, to show their
practical applicability, multiple use-cases are considered.Comment: 32 pages, 19 figures. Added remark on multicritical filtrations in
section 4, typos correcte
Multiparameter Persistence Images for Topological Machine Learning
International audienceIn the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks
Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis
The basic goal of topological data analysis is to apply topology-based descriptors
to understand and describe the shape of data. In this context, homology is one of
the most relevant topological descriptors, well-appreciated for its discrete nature,
computability and dimension independence. A further development is provided
by persistent homology, which allows to track homological features along a oneparameter
increasing sequence of spaces. Multiparameter persistent homology, also
called multipersistent homology, is an extension of the theory of persistent homology
motivated by the need of analyzing data naturally described by several parameters,
such as vector-valued functions. Multipersistent homology presents several issues in
terms of feasibility of computations over real-sized data and theoretical challenges
in the evaluation of possible descriptors. The focus of this thesis is in the interplay
between persistent homology theory and discrete Morse Theory. Discrete Morse
theory provides methods for reducing the computational cost of homology and persistent
homology by considering the discrete Morse complex generated by the discrete
Morse gradient in place of the original complex. The work of this thesis addresses
the problem of computing multipersistent homology, to make such tool usable in real
application domains. This requires both computational optimizations towards the
applications to real-world data, and theoretical insights for finding and interpreting
suitable descriptors. Our computational contribution consists in proposing a new
Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility
of our preprocessing over real datasets, and evaluate the impact of the proposed
algorithm as a preprocessing for computing multipersistent homology. A theoretical
contribution of this thesis consists in proposing a new notion of optimality for such
a preprocessing in the multiparameter context. We show that the proposed notion
generalizes an already known optimality notion from the one-parameter case. Under
this definition, we show that the algorithm we propose as a preprocessing is optimal
in low dimensional domains. In the last part of the thesis, we consider preliminary
applications of the proposed algorithm in the context of topology-based multivariate
visualization by tracking critical features generated by a discrete gradient field compatible
with the multiple scalar fields under study. We discuss (dis)similarities of such
critical features with the state-of-the-art techniques in topology-based multivariate
data visualization
Delaunay Bifiltrations of Functions on Point Clouds
The Delaunay filtration of a point cloud is a central tool of computational topology. Its use is justified
by the topological equivalence of and the offset
(i.e., union-of-balls) filtration of . Given a function , we introduce a Delaunay bifiltration
that satisfies an analogous topological
equivalence, ensuring that topologically
encodes the offset filtrations of all sublevel sets of , as well as the
topological relations between them. is of size
, which for odd matches the worst-case
size of . Adapting the Bowyer-Watson algorithm for
computing Delaunay triangulations, we give a simple, practical algorithm to
compute in time . Our implementation, based on CGAL, computes
with modest overhead compared to computing
, and handles tens of thousands of points in
within seconds.Comment: 28 pages, 7 figures, 8 tables. To appear in the proceedings of SODA2
Euler Characteristic Tools For Topological Data Analysis
In this article, we study Euler characteristic techniques in topological data
analysis. Pointwise computing the Euler characteristic of a family of
simplicial complexes built from data gives rise to the so-called Euler
characteristic profile. We show that this simple descriptor achieve
state-of-the-art performance in supervised tasks at a very low computational
cost. Inspired by signal analysis, we compute hybrid transforms of Euler
characteristic profiles. These integral transforms mix Euler characteristic
techniques with Lebesgue integration to provide highly efficient compressors of
topological signals. As a consequence, they show remarkable performances in
unsupervised settings. On the qualitative side, we provide numerous heuristics
on the topological and geometric information captured by Euler profiles and
their hybrid transforms. Finally, we prove stability results for these
descriptors as well as asymptotic guarantees in random settings.Comment: 39 page
- …