291 research outputs found

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models

    Full text link
    The human {\it ether-a-go-go} (hERG) potassium channel (Kv11.1_\text{v}11.1) plays a critical role in mediating cardiac action potential. The blockade of this ion channel can potentially lead fatal disorder and/or long QT syndrome. Many drugs have been withdrawn because of their serious hERG-cardiotoxicity. It is crucial to assess the hERG blockade activity in the early stage of drug discovery. We are particularly interested in the hERG-cardiotoxicity of compounds collected in the DrugBank database considering that many DrugBank compounds have been approved for therapeutic treatments or have high potential to become drugs. Machine learning-based in silico tools offer a rapid and economical platform to virtually screen DrugBank compounds. We design accurate and robust classifiers for blockers/non-blockers and then build regressors to quantitatively analyze the binding potency of the DrugBank compounds on the hERG channel. Molecular sequences are embedded with two natural language processing (NPL) methods, namely, autoencoder and transformer. Complementary three-dimensional (3D) molecular structures are embedded with two advanced mathematical approaches, i.e., topological Laplacians and algebraic graphs. With our state-of-the-art tools, we reveal that 227 out of the 8641 DrugBank compounds are potential hERG blockers, suggesting serious drug safety problems. Our predictions provide guidance for the further experimental interrogation of DrugBank compounds' hERG-cardiotoxicity
    • …
    corecore