21,851 research outputs found

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Distributed computation of persistent homology

    Full text link
    Persistent homology is a popular and powerful tool for capturing topological features of data. Advances in algorithms for computing persistent homology have reduced the computation time drastically -- as long as the algorithm does not exhaust the available memory. Following up on a recently presented parallel method for persistence computation on shared memory systems, we demonstrate that a simple adaption of the standard reduction algorithm leads to a variant for distributed systems. Our algorithmic design ensures that the data is distributed over the nodes without redundancy; this permits the computation of much larger instances than on a single machine. Moreover, we observe that the parallelism at least compensates for the overhead caused by communication between nodes, and often even speeds up the computation compared to sequential and even parallel shared memory algorithms. In our experiments, we were able to compute the persistent homology of filtrations with more than a billion (10^9) elements within seconds on a cluster with 32 nodes using less than 10GB of memory per node

    Topological data analysis of contagion maps for examining spreading processes on networks

    Get PDF
    Social and biological contagions are influenced by the spatial embeddedness of networks. Historically, many epidemics spread as a wave across part of the Earth's surface; however, in modern contagions long-range edges -- for example, due to airline transportation or communication media -- allow clusters of a contagion to appear in distant locations. Here we study the spread of contagions on networks through a methodology grounded in topological data analysis and nonlinear dimension reduction. We construct "contagion maps" that use multiple contagions on a network to map the nodes as a point cloud. By analyzing the topology, geometry, and dimensionality of manifold structure in such point clouds, we reveal insights to aid in the modeling, forecast, and control of spreading processes. Our approach highlights contagion maps also as a viable tool for inferring low-dimensional structure in networks.Comment: Main Text and Supplementary Informatio

    Persistent Homology Guided Force-Directed Graph Layouts

    Full text link
    Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to confusing graph visualizations. This paper leverages the persistent homology features of an undirected graph as derived information for interactive manipulation of force-directed layouts. We first discuss how to efficiently extract 0-dimensional persistent homology features from both weighted and unweighted undirected graphs. We then introduce the interactive persistence barcode used to manipulate the force-directed graph layout. In particular, the user adds and removes contracting and repulsing forces generated by the persistent homology features, eventually selecting the set of persistent homology features that most improve the layout. Finally, we demonstrate the utility of our approach across a variety of synthetic and real datasets

    Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph

    Full text link
    Data-sensitive metrics adapt distances locally based the density of data points with the goal of aligning distances and some notion of similarity. In this paper, we give the first exact algorithm for computing a data-sensitive metric called the nearest neighbor metric. In fact, we prove the surprising result that a previously published 33-approximation is an exact algorithm. The nearest neighbor metric can be viewed as a special case of a density-based distance used in machine learning, or it can be seen as an example of a manifold metric. Previous computational research on such metrics despaired of computing exact distances on account of the apparent difficulty of minimizing over all continuous paths between a pair of points. We leverage the exact computation of the nearest neighbor metric to compute sparse spanners and persistent homology. We also explore the behavior of the metric built from point sets drawn from an underlying distribution and consider the more general case of inputs that are finite collections of path-connected compact sets. The main results connect several classical theories such as the conformal change of Riemannian metrics, the theory of positive definite functions of Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop novel proof techniques based on the combination of screw functions and Lipschitz extensions that may be of independent interest.Comment: 15 page
    corecore