291 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models
The human {\it ether-a-go-go} (hERG) potassium channel (K)
plays a critical role in mediating cardiac action potential. The blockade of
this ion channel can potentially lead fatal disorder and/or long QT syndrome.
Many drugs have been withdrawn because of their serious hERG-cardiotoxicity. It
is crucial to assess the hERG blockade activity in the early stage of drug
discovery. We are particularly interested in the hERG-cardiotoxicity of
compounds collected in the DrugBank database considering that many DrugBank
compounds have been approved for therapeutic treatments or have high potential
to become drugs. Machine learning-based in silico tools offer a rapid and
economical platform to virtually screen DrugBank compounds. We design accurate
and robust classifiers for blockers/non-blockers and then build regressors to
quantitatively analyze the binding potency of the DrugBank compounds on the
hERG channel. Molecular sequences are embedded with two natural language
processing (NPL) methods, namely, autoencoder and transformer. Complementary
three-dimensional (3D) molecular structures are embedded with two advanced
mathematical approaches, i.e., topological Laplacians and algebraic graphs.
With our state-of-the-art tools, we reveal that 227 out of the 8641 DrugBank
compounds are potential hERG blockers, suggesting serious drug safety problems.
Our predictions provide guidance for the further experimental interrogation of
DrugBank compounds' hERG-cardiotoxicity
- …