54,846 research outputs found

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Quantitative toxicity prediction using topology based multi-task deep neural networks

    Full text link
    The understanding of toxicity is of paramount importance to human health and environmental protection. Quantitative toxicity analysis has become a new standard in the field. This work introduces element specific persistent homology (ESPH), an algebraic topology approach, for quantitative toxicity prediction. ESPH retains crucial chemical information during the topological abstraction of geometric complexity and provides a representation of small molecules that cannot be obtained by any other method. To investigate the representability and predictive power of ESPH for small molecules, ancillary descriptors have also been developed based on physical models. Topological and physical descriptors are paired with advanced machine learning algorithms, such as deep neural network (DNN), random forest (RF) and gradient boosting decision tree (GBDT), to facilitate their applications to quantitative toxicity predictions. A topology based multi-task strategy is proposed to take the advantage of the availability of large data sets while dealing with small data sets. Four benchmark toxicity data sets that involve quantitative measurements are used to validate the proposed approaches. Extensive numerical studies indicate that the proposed topological learning methods are able to outperform the state-of-the-art methods in the literature for quantitative toxicity analysis. Our online server for computing element-specific topological descriptors (ESTDs) is available at http://weilab.math.msu.edu/TopTox/Comment: arXiv admin note: substantial text overlap with arXiv:1703.1095

    Analyzing Collective Motion with Machine Learning and Topology

    Get PDF
    We use topological data analysis and machine learning to study a seminal model of collective motion in biology [D'Orsogna et al., Phys. Rev. Lett. 96 (2006)]. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based in topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the one that is based on traditional order parameters.Comment: Published in Chaos 29, 123125 (2019), DOI: 10.1063/1.512549

    Detection of Topological Materials with Machine Learning

    Full text link
    Databases compiled using ab-initio and symmetry-based calculations now contain tens of thousands of topological insulators and topological semimetals. This makes the application of modern machine learning methods to topological materials possible. Using gradient boosted trees, we show how to construct a machine learning model which can predict the topology of a given existent material with an accuracy of 90%. Such predictions are orders of magnitude faster than actual ab-initio calculations. Through extensive testing of different models we determine which properties help detect topological materials. We identify the sources of our model's errors and we discuss approaches to overcome them.Comment: 34 pages, 7 figures. The ML model is available online at https://www.topologicalquantumchemistry.com/mltqc. Version 2 includes corrections after peer revie

    Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector

    Full text link
    Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection

    Neural Topological Ordering for Computation Graphs

    Full text link
    Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning based approach for topological ordering using an encoder-decoder framework. Our encoder is a novel attention based graph neural network architecture called \emph{Topoformer} which uses different topological transforms of a DAG for message passing. The node embeddings produced by the encoder are converted into node priorities which are used by the decoder to generate a probability distribution over topological orders. We train our model on a dataset of synthetically generated graphs called layered graphs. We show that our model outperforms, or is on-par, with several topological ordering baselines while being significantly faster on synthetic graphs with up to 2k nodes. We also train and test our model on a set of real-world computation graphs, showing performance improvements.Comment: To appear in NeurIPS 202

    DeepEP: A Deep Learning Framework for Identifying Essential Proteins

    Get PDF
    Background: Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics. Results: We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins. Conclusion: We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods
    corecore