Search CORE

76 research outputs found

Generative network complex for the automated generation of druglike molecules

Author: Gao Kaifu
Nguyen Duc D
Tu Meihua
Wei Guo-Wei
Publication venue
Publication date: 28/05/2020
Field of study

Current drug discovery is expensive and time-consuming. It remains a challenging task to create a wide variety of novel compounds with desirable pharmacological properties and cheaply available to low-income people. In this work, we develop a generative network complex (GNC) to generate new drug-like molecules based on the multi-property optimization via the gradient descent in the latent space of an autoencoder. In our GNC, both multiple chemical properties and similarity scores are optimized to generate and predict drug-like molecules with desired chemical properties. To further validate the reliability of the predictions, these molecules are reevaluated and screened by independent 2D fingerprint-based predictors to come up with a few hundreds of new drug candidates. As a demonstration, we apply our GNC to generate a large number of new BACE1 inhibitors, as well as thousands of novel alternative drug candidates for eight existing market drugs, including Ceritinib, Ribociclib, Acalabrutinib, Idelalisib, Dabrafenib, Macimorelin, Enzalutamide, and Panobinostat.Comment: 27 pages, 2 tables and 19 figure

arXiv.org e-Print Archive

Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies

Author: Chen Jiahui
Gao Kaifu
Wang Rui
Wei Guowei
Publication venue
Publication date: 09/03/2021
Field of study

Antibody therapeutics and vaccines are among our last resort to end the raging COVID-19 pandemic. They, however, are prone to over 5,000 mutations on the spike (S) protein uncovered by a Mutation Tracker based on over 200,000 genome isolates. It is imperative to understand how mutations would impact vaccines and antibodies in the development. In this work, we study the mechanism, frequency, and ratio of mutations on the S protein. Additionally, we use 56 antibody structures and analyze their 2D and 3D characteristics. Moreover, we predict the mutation-induced binding free energy (BFE) changes for the complexes of S protein and antibodies or ACE2. By integrating genetics, biophysics, deep learning, and algebraic topology, we reveal that most of 462 mutations on the receptor-binding domain (RBD) will weaken the binding of S protein and antibodies and disrupt the efficacy and reliability of antibody therapies and vaccines. A list of 31 vaccine escape mutants is identified, while many other disruptive mutations are detailed as well. We also unveil that about 65\% existing RBD mutations, including those variants recently found in the United Kingdom (UK) and South Africa, are binding-strengthen mutations, resulting in more infectious COVID-19 variants. We discover the disparity between the extreme values of RBD mutation-induced BFE strengthening and weakening of the bindings with antibodies and ACE2, suggesting that SARS-CoV-2 is at an advanced stage of evolution for human infection, while the human immune system is able to produce optimized antibodies. This discovery implies the vulnerability of current vaccines and antibody drugs to new mutations. Our predictions were validated by comparison with more than 1,400 deep mutations on the S protein RBD. Our results show the urgent need to develop new mutation-resistant vaccines and antibodies and to prepare for seasonal vaccinations.Comment: 28 pages, 17 figure

arXiv.org e-Print Archive

Vaccine-escape and fast-growing mutations in the United Kingdom, the United States, Singapore, Spain, South Africa, and other COVID-19-devastated countries

Author: Chen Jiahui
Gao Kaifu
Wang Rui
Wei Guo-Wei
Publication venue
Publication date: 21/03/2021
Field of study

Recently, the SARS-CoV-2 variants from the United Kingdom (UK), South Africa, and Brazil have received much attention for their increased infectivity, potentially high virulence, and possible threats to existing vaccines and antibody therapies. The question remains if there are other more infectious variants transmitted around the world. We carry out a large-scale study of 252,874 SARS-CoV-2 genome isolates from patients to identify many other rapidly growing mutations on the spike (S) protein receptor-binding domain (RDB). We reveal that 88 out of 95 significant mutations that were observed more than 10 times strengthen the binding between the RBD and the host angiotensin-converting enzyme 2 (ACE2), indicating the virus evolves toward more infectious variants. In particular, we discover new fast-growing RBD mutations N439K, L452R, S477N, S477R, and N501T that also enhance the RBD and ACE2 binding. We further unveil that mutation N501Y involved in United Kingdom (UK), South Africa, and Brazil variants may moderately weaken the binding between the RBD and many known antibodies, while mutations E484K and K417N found in South Africa and Brazilian variants can potentially disrupt the binding between the RDB and many known antibodies. Among three newly identified fast-growing RBD mutations, L452R, which is now known as part of the California variant B.1.427, and N501T are able to effectively weaken the binding of many known antibodies with the RBD. Finally, we hypothesize that RBD mutations that can simultaneously make SARS-CoV-2 more infectious and disrupt the existing antibodies, called vaccine escape mutations, will pose an imminent threat to the current crop of vaccines. A list of most likely vaccine escape mutations is given, including N501Y, L452R, E484K, N501T, S494P, and K417N.Comment: 20 pages, 13 figure

arXiv.org e-Print Archive

Generative network complex (GNC) for drug discovery

Author: Gao Kaifu
Grow Christopher
Nguyen Duc Duy
Wei Guo-Wei
Publication venue
Publication date: 31/10/2019
Field of study

It remains a challenging task to generate a vast variety of novel compounds with desirable pharmacological properties. In this work, a generative network complex (GNC) is proposed as a new platform for designing novel compounds, predicting their physical and chemical properties, and selecting potential drug candidates that fulfill various druggable criteria such as binding affinity, solubility, partition coefficient, etc. We combine a SMILES string generator, which consists of an encoder, a drug-property controlled or regulated latent space, and a decoder, with verification deep neural networks, a target-specific three-dimensional (3D) pose generator, and mathematical deep learning networks to generate new compounds, predict their drug properties, construct 3D poses associated with target proteins, and reevaluate druggability, respectively. New compounds were generated in the latent space by either randomized output, controlled output, or optimized output. In our demonstration, 2.08 million and 2.8 million novel compounds are generated respectively for Cathepsin S and BACE targets. These new compounds are very different from the seeds and cover a larger chemical space. For potentially active compounds, their 3D poses are generated using a state-of-the-art method. The resulting 3D complexes are further evaluated for druggability by a championing deep learning algorithm based on algebraic topology, differential geometry, and algebraic graph theories. Performed on supercomputers, the whole process took less than one week. Therefore, our GNC is an efficient new paradigm for discovering new drug candidates.Comment: 22 pages, 12 figure

arXiv.org e-Print Archive

MathDL: Mathematical deep learning for D3R Grand Challenge 4

Author: Gao Kaifu
Nguyen Duc Duy
Wang Menglun
Wei Guo-Wei
Publication venue
Publication date: 17/09/2019
Field of study

We present the performances of our mathematical deep learning (MathDL) models for D3R Grand Challenge 4 (GC4). This challenge involves pose prediction, affinity ranking, and free energy estimation for beta secretase 1 (BACE) as well as affinity ranking and free energy estimation for Cathepsin S (CatS). We have developed advanced mathematics, namely differential geometry, algebraic graph, and/or algebraic topology, to accurately and efficiently encode high dimensional physical/chemical interactions into scalable low-dimensional rotational and translational invariant representations. These representations are integrated with deep learning models, such as generative adversarial networks (GAN) and convolutional neural networks (CNN) for pose prediction and energy evaluation, respectively. Overall, our MathDL models achieved the top place in pose prediction for BACE ligands in Stage 1a. Moreover, our submissions obtained the highest Spearman correlation coefficient on the affinity ranking of 460 CatS compounds, and the smallest centered root mean square error on the free energy set of 39 CatS molecules. It is worthy to mention that our method for docking pose predictions has significantly improved from our previous ones.Comment: 24 pages, 9 figure, and one tabl

arXiv.org e-Print Archive

Unveiling the molecular mechanism of SARS-CoV-2 main protease inhibition from 92 crystal structures

Author: Chen Jiahui
Gao Kaifu
Nguyen Duc D
Wang Rui
Wei Guo-Wei
Publication venue
Publication date: 27/05/2020
Field of study

Currently, there is no effective antiviral drugs nor vaccine for coronavirus disease 2019 (COVID-19) caused by acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Due to its high conservativeness and low similarity with human genes, SARS-CoV-2 main protease (M

^{\text{pro}}

) is one of the most favorable drug targets. However, the current understanding of the molecular mechanism of M

^{\text{pro}}

inhibition is limited by the lack of reliable binding affinity ranking and prediction of existing structures of M

^{\text{pro}}

-inhibitor complexes. This work integrates mathematics and deep learning (MathDL) to provide a reliable ranking of the binding affinities of 92 SARS-CoV-2 M

^{\text{pro}}

inhibitor structures. We reveal that Gly143 residue in M

^{\text{pro}}

is the most attractive site to form hydrogen bonds, followed by Cys145, Glu166, and His163. We also identify 45 targeted covalent bonding inhibitors. Validation on the PDBbind v2016 core set benchmark shows the MathDL has achieved the top performance with Pearson's correlation coefficient (

R_p

) being 0.858. Most importantly, MathDL is validated on a carefully curated SARS-CoV-2 inhibitor dataset with the averaged

R_p

as high as 0.751, which endows the reliability of the present binding affinity prediction. The present binding affinity ranking, interaction analysis, and fragment decomposition offer a foundation for future drug discovery efforts.Comment: 17 pages, 8 figures, 3 table

arXiv.org e-Print Archive

Review of COVID-19 Antibody Therapies

Author: Chen Jiahui
Gao Kaifu
Nguyen Duc Duy
Wang Rui
Wei Guo-Wei
Publication venue
Publication date: 18/06/2020
Field of study

Under the global health emergency caused by coronavirus disease 2019 (COVID-19), efficient and specific therapies are urgently needed. Compared with traditional small-molecular drugs, antibody therapies are relatively easy to develop and as specific as vaccines in targeting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and thus attract much attention in the past few months. This work reviews seven existing antibodies for SARS-CoV-2 spike (S) protein with three-dimensional (3D) structures deposited in the Protein Data Bank. Five antibody structures associated with SARS-CoV are evaluated for their potential in neutralizing SARS-CoV-2. The interactions of these antibodies with the S protein receptor-binding domain (RBD) are compared with those of angiotensin-converting enzyme 2 (ACE2) and RBD complexes. Due to the orders of magnitude in the discrepancies of experimental binding affinities, we introduce topological data analysis (TDA), a variety of network models, and deep learning to analyze the binding strength and therapeutic potential of the aforementioned fourteen antibody-antigen complexes. The current COVID-19 antibody clinical trials, which are not limited to the S protein target, are also reviewed.Comment: 30 pages, 10 figures, 5 table

arXiv.org e-Print Archive

Repositioning of 8565 existing drugs for COVID-19

Author: Chen Jiahui
Gao Kaifu
Nguyen Duc Duy
Wang Rui
Wei Guo-Wei
Publication venue
Publication date: 20/05/2020
Field of study

The coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected near 5 million people and led to over 0.3 million deaths. Currently, there is no specific anti-SARS-CoV-2 medication. New drug discovery typically takes more than ten years. Drug repositioning becomes one of the most feasible approaches for combating COVID-19. This work curates the largest available experimental dataset for SARS-CoV-2 or SARS-CoV main protease inhibitors. Based on this dataset, we develop validated machine learning models with relatively low root mean square error to screen 1553 FDA-approved drugs as well as other 7012 investigational or off-market drugs in DrugBank. We found that many existing drugs might be potentially potent to SARS-CoV-2. The druggability of many potent SARS-CoV-2 main protease inhibitors is analyzed. This work offers a foundation for further experimental studies of COVID-19 drug repositioning.Comment: 20 pages, 6 figures and 6 table

arXiv.org e-Print Archive

Characterizing SARS-CoV-2 mutations in the United States

Author: Chen Jiahui
Gao Kaifu
Hozumi Yuta
Wang Rui
Wei Guo-Wei
Yin Changchuan
Publication venue
Publication date: 24/07/2020
Field of study

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been mutating since it was first sequenced in early January 2020. The genetic variants have developed into a few distinct clusters with different properties. Since the United States (US) has the highest number of viral infected patients globally, it is essential to understand the US SARS-CoV-2. Using genotyping, sequence-alignment, time-evolution,

k

-means clustering, protein-folding stability, algebraic topology, and network theory, we reveal that the US SARS-CoV-2 has four substrains and five top US SARS-CoV-2 mutations were first detected in China (2 cases), Singapore (2 cases), and the United Kingdom (1 case). The next three top US SARS-CoV-2 mutations were first detected in the US. These eight top mutations belong to two disconnected groups. The first group consisting of 5 concurrent mutations is prevailing, while the other group with three concurrent mutations gradually fades out. Our analysis suggests that female immune systems are more active than those of males in responding to SARS-CoV-2 infections. We identify that one of the top mutations, 27964C

>

T-(S24L) on ORF8, has an unusually strong gender dependence. Based on the analysis of all mutations on the spike protein, we further uncover that three of four US SASR-CoV-2 substrains become more infectious. Our study calls for effective viral control and containing strategies in the US.Comment: 31 pages, 20 figures, and 4 table

arXiv.org e-Print Archive

Are 2D fingerprints still valuable for drug discovery?

Author: Gao Kaifu
Mathiowetz Alan M.
Nguyen Duc Duy
Sresht Vishnu
Tu Meihua
Wei Guo-Wei
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 03/11/2019
Field of study

Recently, molecular fingerprints extracted from three-dimensional (3D) structures using advanced mathematics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machine learning, especially deep learning algorithms to outperform other methods in drug discovery applications and competitions. This raises the question of whether classical 2D fingerprints are still valuable in computer-aided drug discovery. This work considers 23 datasets associated with four typical problems, namely protein-ligand binding, toxicity, solubility and partition coefficient to assess the performance of eight 2D fingerprints. Advanced machine learning algorithms including random forest, gradient boosted decision tree, single-task deep neural network and multitask deep neural network are employed to construct efficient 2D-fingerprint based models. Additionally, appropriate consensus models are built to further enhance the performance of 2D-fingerprintbased methods. It is demonstrated that 2D-fingerprint-based models perform as well as the state-of-the-art 3D structure-based models for the predictions of toxicity, solubility, partition coefficient and protein-ligand binding affinity based on only ligand information. However, 3D structure-based models outperform 2D fingerprint-based methods in complex-based protein-ligand binding affinity predictions

arXiv.org e-Print Archive