18 research outputs found
GPU-Accelerated Computation of Vietoris-Rips Persistence Barcodes
The computation of Vietoris-Rips persistence barcodes is both
execution-intensive and memory-intensive. In this paper, we study the
computational structure of Vietoris-Rips persistence barcodes, and identify
several unique mathematical properties and algorithmic opportunities with
connections to the GPU. Mathematically and empirically, we look into the
properties of apparent pairs, which are independently identifiable persistence
pairs comprising up to 99% of persistence pairs. We give theoretical upper and
lower bounds of the apparent pair rate and model the average case. We also
design massively parallel algorithms to take advantage of the very large number
of simplices that can be processed independently of each other. Having
identified these opportunities, we develop a GPU-accelerated software for
computing Vietoris-Rips persistence barcodes, called Ripser++. The software
achieves up to 30x speedup over the total execution time of the original Ripser
and also reduces CPU-memory usage by up to 2.0x. We believe our
GPU-acceleration based efforts open a new chapter for the advancement of
topological data analysis in the post-Moore's Law era.Comment: 36 pages, 15 figures. To be published in Symposium on Computational
Geometry 202
Manifold Topology Divergence: a Framework for Comparing Data Manifolds
We develop a framework for comparing data manifolds, aimed, in particular,
towards the evaluation of deep generative models. We describe a novel tool,
Cross-Barcode(P,Q), that, given a pair of distributions in a high-dimensional
space, tracks multiscale topology spacial discrepancies between manifolds on
which the distributions are concentrated. Based on the Cross-Barcode, we
introduce the Manifold Topology Divergence score (MTop-Divergence) and apply it
to assess the performance of deep generative models in various domains: images,
3D-shapes, time-series, and on different datasets: MNIST, Fashion MNIST, SVHN,
CIFAR10, FFHQ, chest X-ray images, market stock data, ShapeNet. We demonstrate
that the MTop-Divergence accurately detects various degrees of mode-dropping,
intra-mode collapse, mode invention, and image disturbance. Our algorithm
scales well (essentially linearly) with the increase of the dimension of the
ambient high-dimensional space. It is one of the first TDA-based practical
methodologies that can be applied universally to datasets of different sizes
and dimensions, including the ones on which the most recent GANs in the visual
domain are trained. The proposed method is domain agnostic and does not rely on
pre-trained networks
Fast Computation of Zigzag Persistence
Zigzag persistence is a powerful extension of the standard persistence which allows deletions of simplices besides insertions. However, computing zigzag persistence usually takes considerably more time than the standard persistence. We propose an algorithm called FastZigzag which narrows this efficiency gap. Our main result is that an input simplex-wise zigzag filtration can be converted to a cell-wise non-zigzag filtration of a ?-complex with the same length, where the cells are copies of the input simplices. This conversion step in FastZigzag incurs very little cost. Furthermore, the barcode of the original filtration can be easily read from the barcode of the new cell-wise filtration because the conversion embodies a series of diamond switches known in topological data analysis. This seemingly simple observation opens up the vast possibilities for improving the computation of zigzag persistence because any efficient algorithm/software for standard persistence can now be applied to computing zigzag persistence. Our experiment shows that this indeed achieves substantial performance gain over the existing state-of-the-art softwares
Learning Topology-Preserving Data Representations
We propose a method for learning topology-preserving data representations
(dimensionality reduction). The method aims to provide topological similarity
between the data manifold and its latent representation via enforcing the
similarity in topological features (clusters, loops, 2D voids, etc.) and their
localization. The core of the method is the minimization of the Representation
Topology Divergence (RTD) between original high-dimensional data and
low-dimensional representation in latent space. RTD minimization provides
closeness in topological features with strong theoretical guarantees. We
develop a scheme for RTD differentiation and apply it as a loss term for the
autoencoder. The proposed method "RTD-AE" better preserves the global structure
and topology of the data manifold than state-of-the-art competitors as measured
by linear correlation, triplet distance ranking accuracy, and Wasserstein
distance between persistence barcodes
Efficient two-parameter persistence computation via cohomology
Clearing is a simple but effective optimization for the standard algorithm of
persistent homology (PH), which dramatically improves the speed and scalability
of PH computations for Vietoris--Rips filtrations. Due to the quick growth of
the boundary matrices of a Vietoris--Rips filtration with increasing dimension,
clearing is only effective when used in conjunction with a dual (cohomological)
variant of the standard algorithm. This approach has not previously been
applied successfully to the computation of two-parameter PH.
We introduce a cohomological algorithm for computing minimal free resolutions
of two-parameter PH that allows for clearing. To derive our algorithm, we
extend the duality principles which underlie the one-parameter approach to the
two-parameter setting. We provide an implementation and report experimental run
times for function-Rips filtrations. Our method is faster than the current
state-of-the-art by a factor of up to 20.Comment: This is an extended version of a conference paper that appeared at
SoCG 2023, see https://drops.dagstuhl.de/opus/volltexte/2023/1786
ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery
In computer-aided drug discovery (CADD), virtual screening (VS) is used for
identifying the drug candidates that are most likely to bind to a molecular
target in a large library of compounds. Most VS methods to date have focused on
using canonical compound representations (e.g., SMILES strings, Morgan
fingerprints) or generating alternative fingerprints of the compounds by
training progressively more complex variational autoencoders (VAEs) and graph
neural networks (GNNs). Although VAEs and GNNs led to significant improvements
in VS performance, these methods suffer from reduced performance when scaling
to large virtual compound datasets. The performance of these methods has shown
only incremental improvements in the past few years. To address this problem,
we developed a novel method using multiparameter persistence (MP) homology that
produces topological fingerprints of the compounds as multidimensional vectors.
Our primary contribution is framing the VS process as a new topology-based
graph ranking problem by partitioning a compound into chemical substructures
informed by the periodic properties of its atoms and extracting their
persistent homology features at multiple resolution levels. We show that the
margin loss fine-tuning of pretrained Triplet networks attains highly
competitive results in differentiating between compounds in the embedding space
and ranking their likelihood of becoming effective drug candidates. We further
establish theoretical guarantees for the stability properties of our proposed
MP signatures, and demonstrate that our models, enhanced by the MP signatures,
outperform state-of-the-art methods on benchmark datasets by a wide and highly
statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain
for DUD-E Diverse dataset).Comment: NeurIPS, 2022 (36th Conference on Neural Information Processing
Systems
Improving neural networks using topological data analysis
Generalisation measures are metrics that indicate how well a neural network will perform in presence of unknown data. Differentiable generalisation measures with respect to the parameters of a neural network that use only the training set are candidates to be used as loss regularisation terms to improve neural network training processes. Recently, persistent homology has been used to build robust generalisation measures of this kind by means of persistence diagrams. However, some of these measures involve non-standard distances, and thus the usual stability and differentiability results are not valid. In this thesis, we prove more general stability and differentiability results that fit the conditions required by the previous topological measures. Also, we define a new measure called topological redundancy that we use together with one of the previous topological terms to improve accuracies of networks with respect to usual training without topological regularisation terms
Memory Clustering Using Persistent Homology for Multimodality- and Discontinuity-Sensitive Learning of Optimal Control Warm-Starts
Shooting methods are an efficient approach to solving nonlinear optimal
control problems. As they use local optimization, they exhibit favorable
convergence when initialized with a good warm-start but may not converge at all
if provided with a poor initial guess. Recent work has focused on providing an
initial guess from a learned model trained on samples generated during an
offline exploration of the problem space. However, in practice the solutions
contain discontinuities introduced by system dynamics or the environment.
Additionally, in many cases multiple equally suitable, i.e., multi-modal,
solutions exist to solve a problem. Classic learning approaches smooth across
the boundary of these discontinuities and thus generalize poorly. In this work,
we apply tools from algebraic topology to extract information on the underlying
structure of the solution space. In particular, we introduce a method based on
persistent homology to automatically cluster the dataset of precomputed
solutions to obtain different candidate initial guesses. We then train a
Mixture-of-Experts within each cluster to predict state and control
trajectories to warm-start the optimal control solver and provide a comparison
with modality-agnostic learning. We demonstrate our method on a cart-pole toy
problem and a quadrotor avoiding obstacles, and show that clustering samples
based on inherent structure improves the warm-start quality.Comment: 12 pages, 10 figures, accepted as a regular paper in IEEE
Transactions on Robotics (T-RO). Supplementary video:
https://youtu.be/lUULTWCFxY8 Code:
https://github.com/wxmerkt/topological_memory_clustering The first two
authors contributed equall
Acceptability Judgements via Examining the Topology of Attention Maps
The role of the attention mechanism in encoding linguistic knowledge has
received special interest in NLP. However, the ability of the attention heads
to judge the grammatical acceptability of a sentence has been underexplored.
This paper approaches the paradigm of acceptability judgments with topological
data analysis (TDA), showing that the geometric properties of the attention
graph can be efficiently exploited for two standard practices in linguistics:
binary judgments and linguistic minimal pairs. Topological features enhance the
BERT-based acceptability classifier scores by %-% on CoLA in three
languages (English, Italian, and Swedish). By revealing the topological
discrepancy between attention maps of minimal pairs, we achieve the human-level
performance on the BLiMP benchmark, outperforming nine statistical and
Transformer LM baselines. At the same time, TDA provides the foundation for
analyzing the linguistic functions of attention heads and interpreting the
correspondence between the graph features and grammatical phenomena.Comment: Accepted to EMNLP 2022 Finding