54,846 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Quantitative toxicity prediction using topology based multi-task deep neural networks
The understanding of toxicity is of paramount importance to human health and
environmental protection. Quantitative toxicity analysis has become a new
standard in the field. This work introduces element specific persistent
homology (ESPH), an algebraic topology approach, for quantitative toxicity
prediction. ESPH retains crucial chemical information during the topological
abstraction of geometric complexity and provides a representation of small
molecules that cannot be obtained by any other method. To investigate the
representability and predictive power of ESPH for small molecules, ancillary
descriptors have also been developed based on physical models. Topological and
physical descriptors are paired with advanced machine learning algorithms, such
as deep neural network (DNN), random forest (RF) and gradient boosting decision
tree (GBDT), to facilitate their applications to quantitative toxicity
predictions. A topology based multi-task strategy is proposed to take the
advantage of the availability of large data sets while dealing with small data
sets. Four benchmark toxicity data sets that involve quantitative measurements
are used to validate the proposed approaches. Extensive numerical studies
indicate that the proposed topological learning methods are able to outperform
the state-of-the-art methods in the literature for quantitative toxicity
analysis. Our online server for computing element-specific topological
descriptors (ESTDs) is available at http://weilab.math.msu.edu/TopTox/Comment: arXiv admin note: substantial text overlap with arXiv:1703.1095
Analyzing Collective Motion with Machine Learning and Topology
We use topological data analysis and machine learning to study a seminal
model of collective motion in biology [D'Orsogna et al., Phys. Rev. Lett. 96
(2006)]. This model describes agents interacting nonlinearly via
attractive-repulsive social forces and gives rise to collective behaviors such
as flocking and milling. To classify the emergent collective motion in a large
library of numerical simulations and to recover model parameters from the
simulation data, we apply machine learning techniques to two different types of
input. First, we input time series of order parameters traditionally used in
studies of collective motion. Second, we input measures based in topology that
summarize the time-varying persistent homology of simulation data over multiple
scales. This topological approach does not require prior knowledge of the
expected patterns. For both unsupervised and supervised machine learning
methods, the topological approach outperforms the one that is based on
traditional order parameters.Comment: Published in Chaos 29, 123125 (2019), DOI: 10.1063/1.512549
Detection of Topological Materials with Machine Learning
Databases compiled using ab-initio and symmetry-based calculations now
contain tens of thousands of topological insulators and topological semimetals.
This makes the application of modern machine learning methods to topological
materials possible. Using gradient boosted trees, we show how to construct a
machine learning model which can predict the topology of a given existent
material with an accuracy of 90%. Such predictions are orders of magnitude
faster than actual ab-initio calculations. Through extensive testing of
different models we determine which properties help detect topological
materials. We identify the sources of our model's errors and we discuss
approaches to overcome them.Comment: 34 pages, 7 figures. The ML model is available online at
https://www.topologicalquantumchemistry.com/mltqc. Version 2 includes
corrections after peer revie
Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector
Chatter detection has become a prominent subject of interest due to its
effect on cutting tool life, surface finish and spindle of machine tool. Most
of the existing methods in chatter detection literature are based on signal
processing and signal decomposition. In this study, we use topological features
of data simulating cutting tool vibrations, combined with four supervised
machine learning algorithms to diagnose chatter in the milling process.
Persistence diagrams, a method of representing topological features, are not
easily used in the context of machine learning, so they must be transformed
into a form that is more amenable. Specifically, we will focus on two different
methods for featurizing persistence diagrams, Carlsson coordinates and template
functions. In this paper, we provide classification results for simulated data
from various cutting configurations, including upmilling and downmilling, in
addition to the same data with some added noise. Our results show that Carlsson
Coordinates and Template Functions yield accuracies as high as 96% and 95%,
respectively. We also provide evidence that these topological methods are noise
robust descriptors for chatter detection
Neural Topological Ordering for Computation Graphs
Recent works on machine learning for combinatorial optimization have shown
that learning based approaches can outperform heuristic methods in terms of
speed and performance. In this paper, we consider the problem of finding an
optimal topological order on a directed acyclic graph with focus on the memory
minimization problem which arises in compilers. We propose an end-to-end
machine learning based approach for topological ordering using an
encoder-decoder framework. Our encoder is a novel attention based graph neural
network architecture called \emph{Topoformer} which uses different topological
transforms of a DAG for message passing. The node embeddings produced by the
encoder are converted into node priorities which are used by the decoder to
generate a probability distribution over topological orders. We train our model
on a dataset of synthetically generated graphs called layered graphs. We show
that our model outperforms, or is on-par, with several topological ordering
baselines while being significantly faster on synthetic graphs with up to 2k
nodes. We also train and test our model on a set of real-world computation
graphs, showing performance improvements.Comment: To appear in NeurIPS 202
DeepEP: A Deep Learning Framework for Identifying Essential Proteins
Background: Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics. Results: We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins. Conclusion: We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods
- …