132,609 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Machine Learning, Quantum Mechanics, and Chemical Compound Space
We review recent studies dealing with the generation of machine learning
models of molecular and solid properties. The models are trained and validated
using standard quantum chemistry results obtained for organic molecules and
materials selected from chemical space at random
Quantitative toxicity prediction using topology based multi-task deep neural networks
The understanding of toxicity is of paramount importance to human health and
environmental protection. Quantitative toxicity analysis has become a new
standard in the field. This work introduces element specific persistent
homology (ESPH), an algebraic topology approach, for quantitative toxicity
prediction. ESPH retains crucial chemical information during the topological
abstraction of geometric complexity and provides a representation of small
molecules that cannot be obtained by any other method. To investigate the
representability and predictive power of ESPH for small molecules, ancillary
descriptors have also been developed based on physical models. Topological and
physical descriptors are paired with advanced machine learning algorithms, such
as deep neural network (DNN), random forest (RF) and gradient boosting decision
tree (GBDT), to facilitate their applications to quantitative toxicity
predictions. A topology based multi-task strategy is proposed to take the
advantage of the availability of large data sets while dealing with small data
sets. Four benchmark toxicity data sets that involve quantitative measurements
are used to validate the proposed approaches. Extensive numerical studies
indicate that the proposed topological learning methods are able to outperform
the state-of-the-art methods in the literature for quantitative toxicity
analysis. Our online server for computing element-specific topological
descriptors (ESTDs) is available at http://weilab.math.msu.edu/TopTox/Comment: arXiv admin note: substantial text overlap with arXiv:1703.1095
- …