21,851 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Distributed computation of persistent homology
Persistent homology is a popular and powerful tool for capturing topological
features of data. Advances in algorithms for computing persistent homology have
reduced the computation time drastically -- as long as the algorithm does not
exhaust the available memory. Following up on a recently presented parallel
method for persistence computation on shared memory systems, we demonstrate
that a simple adaption of the standard reduction algorithm leads to a variant
for distributed systems. Our algorithmic design ensures that the data is
distributed over the nodes without redundancy; this permits the computation of
much larger instances than on a single machine. Moreover, we observe that the
parallelism at least compensates for the overhead caused by communication
between nodes, and often even speeds up the computation compared to sequential
and even parallel shared memory algorithms. In our experiments, we were able to
compute the persistent homology of filtrations with more than a billion (10^9)
elements within seconds on a cluster with 32 nodes using less than 10GB of
memory per node
Topological data analysis of contagion maps for examining spreading processes on networks
Social and biological contagions are influenced by the spatial embeddedness
of networks. Historically, many epidemics spread as a wave across part of the
Earth's surface; however, in modern contagions long-range edges -- for example,
due to airline transportation or communication media -- allow clusters of a
contagion to appear in distant locations. Here we study the spread of
contagions on networks through a methodology grounded in topological data
analysis and nonlinear dimension reduction. We construct "contagion maps" that
use multiple contagions on a network to map the nodes as a point cloud. By
analyzing the topology, geometry, and dimensionality of manifold structure in
such point clouds, we reveal insights to aid in the modeling, forecast, and
control of spreading processes. Our approach highlights contagion maps also as
a viable tool for inferring low-dimensional structure in networks.Comment: Main Text and Supplementary Informatio
Persistent Homology Guided Force-Directed Graph Layouts
Graphs are commonly used to encode relationships among entities, yet their
abstractness makes them difficult to analyze. Node-link diagrams are popular
for drawing graphs, and force-directed layouts provide a flexible method for
node arrangements that use local relationships in an attempt to reveal the
global shape of the graph. However, clutter and overlap of unrelated structures
can lead to confusing graph visualizations. This paper leverages the persistent
homology features of an undirected graph as derived information for interactive
manipulation of force-directed layouts. We first discuss how to efficiently
extract 0-dimensional persistent homology features from both weighted and
unweighted undirected graphs. We then introduce the interactive persistence
barcode used to manipulate the force-directed graph layout. In particular, the
user adds and removes contracting and repulsing forces generated by the
persistent homology features, eventually selecting the set of persistent
homology features that most improve the layout. Finally, we demonstrate the
utility of our approach across a variety of synthetic and real datasets
Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph
Data-sensitive metrics adapt distances locally based the density of data
points with the goal of aligning distances and some notion of similarity. In
this paper, we give the first exact algorithm for computing a data-sensitive
metric called the nearest neighbor metric. In fact, we prove the surprising
result that a previously published -approximation is an exact algorithm.
The nearest neighbor metric can be viewed as a special case of a
density-based distance used in machine learning, or it can be seen as an
example of a manifold metric. Previous computational research on such metrics
despaired of computing exact distances on account of the apparent difficulty of
minimizing over all continuous paths between a pair of points. We leverage the
exact computation of the nearest neighbor metric to compute sparse spanners and
persistent homology. We also explore the behavior of the metric built from
point sets drawn from an underlying distribution and consider the more general
case of inputs that are finite collections of path-connected compact sets.
The main results connect several classical theories such as the conformal
change of Riemannian metrics, the theory of positive definite functions of
Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop
novel proof techniques based on the combination of screw functions and
Lipschitz extensions that may be of independent interest.Comment: 15 page
- …