76 research outputs found
Generative network complex for the automated generation of druglike molecules
Current drug discovery is expensive and time-consuming. It remains a
challenging task to create a wide variety of novel compounds with desirable
pharmacological properties and cheaply available to low-income people. In this
work, we develop a generative network complex (GNC) to generate new drug-like
molecules based on the multi-property optimization via the gradient descent in
the latent space of an autoencoder. In our GNC, both multiple chemical
properties and similarity scores are optimized to generate and predict
drug-like molecules with desired chemical properties. To further validate the
reliability of the predictions, these molecules are reevaluated and screened by
independent 2D fingerprint-based predictors to come up with a few hundreds of
new drug candidates. As a demonstration, we apply our GNC to generate a large
number of new BACE1 inhibitors, as well as thousands of novel alternative drug
candidates for eight existing market drugs, including Ceritinib, Ribociclib,
Acalabrutinib, Idelalisib, Dabrafenib, Macimorelin, Enzalutamide, and
Panobinostat.Comment: 27 pages, 2 tables and 19 figure
Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies
Antibody therapeutics and vaccines are among our last resort to end the
raging COVID-19 pandemic. They, however, are prone to over 5,000 mutations on
the spike (S) protein uncovered by a Mutation Tracker based on over 200,000
genome isolates. It is imperative to understand how mutations would impact
vaccines and antibodies in the development. In this work, we study the
mechanism, frequency, and ratio of mutations on the S protein. Additionally, we
use 56 antibody structures and analyze their 2D and 3D characteristics.
Moreover, we predict the mutation-induced binding free energy (BFE) changes for
the complexes of S protein and antibodies or ACE2. By integrating genetics,
biophysics, deep learning, and algebraic topology, we reveal that most of 462
mutations on the receptor-binding domain (RBD) will weaken the binding of S
protein and antibodies and disrupt the efficacy and reliability of antibody
therapies and vaccines. A list of 31 vaccine escape mutants is identified,
while many other disruptive mutations are detailed as well. We also unveil that
about 65\% existing RBD mutations, including those variants recently found in
the United Kingdom (UK) and South Africa, are binding-strengthen mutations,
resulting in more infectious COVID-19 variants. We discover the disparity
between the extreme values of RBD mutation-induced BFE strengthening and
weakening of the bindings with antibodies and ACE2, suggesting that SARS-CoV-2
is at an advanced stage of evolution for human infection, while the human
immune system is able to produce optimized antibodies. This discovery implies
the vulnerability of current vaccines and antibody drugs to new mutations. Our
predictions were validated by comparison with more than 1,400 deep mutations on
the S protein RBD. Our results show the urgent need to develop new
mutation-resistant vaccines and antibodies and to prepare for seasonal
vaccinations.Comment: 28 pages, 17 figure
Vaccine-escape and fast-growing mutations in the United Kingdom, the United States, Singapore, Spain, South Africa, and other COVID-19-devastated countries
Recently, the SARS-CoV-2 variants from the United Kingdom (UK), South Africa,
and Brazil have received much attention for their increased infectivity,
potentially high virulence, and possible threats to existing vaccines and
antibody therapies. The question remains if there are other more infectious
variants transmitted around the world. We carry out a large-scale study of
252,874 SARS-CoV-2 genome isolates from patients to identify many other rapidly
growing mutations on the spike (S) protein receptor-binding domain (RDB). We
reveal that 88 out of 95 significant mutations that were observed more than 10
times strengthen the binding between the RBD and the host
angiotensin-converting enzyme 2 (ACE2), indicating the virus evolves toward
more infectious variants. In particular, we discover new fast-growing RBD
mutations N439K, L452R, S477N, S477R, and N501T that also enhance the RBD and
ACE2 binding. We further unveil that mutation N501Y involved in United Kingdom
(UK), South Africa, and Brazil variants may moderately weaken the binding
between the RBD and many known antibodies, while mutations E484K and K417N
found in South Africa and Brazilian variants can potentially disrupt the
binding between the RDB and many known antibodies. Among three newly identified
fast-growing RBD mutations, L452R, which is now known as part of the California
variant B.1.427, and N501T are able to effectively weaken the binding of many
known antibodies with the RBD. Finally, we hypothesize that RBD mutations that
can simultaneously make SARS-CoV-2 more infectious and disrupt the existing
antibodies, called vaccine escape mutations, will pose an imminent threat to
the current crop of vaccines. A list of most likely vaccine escape mutations is
given, including N501Y, L452R, E484K, N501T, S494P, and K417N.Comment: 20 pages, 13 figure
Generative network complex (GNC) for drug discovery
It remains a challenging task to generate a vast variety of novel compounds
with desirable pharmacological properties. In this work, a generative network
complex (GNC) is proposed as a new platform for designing novel compounds,
predicting their physical and chemical properties, and selecting potential drug
candidates that fulfill various druggable criteria such as binding affinity,
solubility, partition coefficient, etc. We combine a SMILES string generator,
which consists of an encoder, a drug-property controlled or regulated latent
space, and a decoder, with verification deep neural networks, a target-specific
three-dimensional (3D) pose generator, and mathematical deep learning networks
to generate new compounds, predict their drug properties, construct 3D poses
associated with target proteins, and reevaluate druggability, respectively. New
compounds were generated in the latent space by either randomized output,
controlled output, or optimized output. In our demonstration, 2.08 million and
2.8 million novel compounds are generated respectively for Cathepsin S and BACE
targets. These new compounds are very different from the seeds and cover a
larger chemical space. For potentially active compounds, their 3D poses are
generated using a state-of-the-art method. The resulting 3D complexes are
further evaluated for druggability by a championing deep learning algorithm
based on algebraic topology, differential geometry, and algebraic graph
theories. Performed on supercomputers, the whole process took less than one
week. Therefore, our GNC is an efficient new paradigm for discovering new drug
candidates.Comment: 22 pages, 12 figure
MathDL: Mathematical deep learning for D3R Grand Challenge 4
We present the performances of our mathematical deep learning (MathDL) models
for D3R Grand Challenge 4 (GC4). This challenge involves pose prediction,
affinity ranking, and free energy estimation for beta secretase 1 (BACE) as
well as affinity ranking and free energy estimation for Cathepsin S (CatS). We
have developed advanced mathematics, namely differential geometry, algebraic
graph, and/or algebraic topology, to accurately and efficiently encode high
dimensional physical/chemical interactions into scalable low-dimensional
rotational and translational invariant representations. These representations
are integrated with deep learning models, such as generative adversarial
networks (GAN) and convolutional neural networks (CNN) for pose prediction and
energy evaluation, respectively. Overall, our MathDL models achieved the top
place in pose prediction for BACE ligands in Stage 1a. Moreover, our
submissions obtained the highest Spearman correlation coefficient on the
affinity ranking of 460 CatS compounds, and the smallest centered root mean
square error on the free energy set of 39 CatS molecules. It is worthy to
mention that our method for docking pose predictions has significantly improved
from our previous ones.Comment: 24 pages, 9 figure, and one tabl
Unveiling the molecular mechanism of SARS-CoV-2 main protease inhibition from 92 crystal structures
Currently, there is no effective antiviral drugs nor vaccine for coronavirus
disease 2019 (COVID-19) caused by acute respiratory syndrome coronavirus 2
(SARS-CoV-2). Due to its high conservativeness and low similarity with human
genes, SARS-CoV-2 main protease (M) is one of the most favorable
drug targets. However, the current understanding of the molecular mechanism of
M inhibition is limited by the lack of reliable binding affinity
ranking and prediction of existing structures of M-inhibitor
complexes. This work integrates mathematics and deep learning (MathDL) to
provide a reliable ranking of the binding affinities of 92 SARS-CoV-2
M inhibitor structures. We reveal that Gly143 residue in
M is the most attractive site to form hydrogen bonds, followed
by Cys145, Glu166, and His163. We also identify 45 targeted covalent bonding
inhibitors. Validation on the PDBbind v2016 core set benchmark shows the MathDL
has achieved the top performance with Pearson's correlation coefficient ()
being 0.858. Most importantly, MathDL is validated on a carefully curated
SARS-CoV-2 inhibitor dataset with the averaged as high as 0.751, which
endows the reliability of the present binding affinity prediction. The present
binding affinity ranking, interaction analysis, and fragment decomposition
offer a foundation for future drug discovery efforts.Comment: 17 pages, 8 figures, 3 table
Review of COVID-19 Antibody Therapies
Under the global health emergency caused by coronavirus disease 2019
(COVID-19), efficient and specific therapies are urgently needed. Compared with
traditional small-molecular drugs, antibody therapies are relatively easy to
develop and as specific as vaccines in targeting severe acute respiratory
syndrome coronavirus 2 (SARS-CoV-2), and thus attract much attention in the
past few months. This work reviews seven existing antibodies for SARS-CoV-2
spike (S) protein with three-dimensional (3D) structures deposited in the
Protein Data Bank. Five antibody structures associated with SARS-CoV are
evaluated for their potential in neutralizing SARS-CoV-2. The interactions of
these antibodies with the S protein receptor-binding domain (RBD) are compared
with those of angiotensin-converting enzyme 2 (ACE2) and RBD complexes. Due to
the orders of magnitude in the discrepancies of experimental binding
affinities, we introduce topological data analysis (TDA), a variety of network
models, and deep learning to analyze the binding strength and therapeutic
potential of the aforementioned fourteen antibody-antigen complexes. The
current COVID-19 antibody clinical trials, which are not limited to the S
protein target, are also reviewed.Comment: 30 pages, 10 figures, 5 table
Repositioning of 8565 existing drugs for COVID-19
The coronavirus disease 2019 (COVID-19) pandemic caused by severe acute
respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected near 5 million
people and led to over 0.3 million deaths. Currently, there is no specific
anti-SARS-CoV-2 medication. New drug discovery typically takes more than ten
years. Drug repositioning becomes one of the most feasible approaches for
combating COVID-19. This work curates the largest available experimental
dataset for SARS-CoV-2 or SARS-CoV main protease inhibitors. Based on this
dataset, we develop validated machine learning models with relatively low root
mean square error to screen 1553 FDA-approved drugs as well as other 7012
investigational or off-market drugs in DrugBank. We found that many existing
drugs might be potentially potent to SARS-CoV-2. The druggability of many
potent SARS-CoV-2 main protease inhibitors is analyzed. This work offers a
foundation for further experimental studies of COVID-19 drug repositioning.Comment: 20 pages, 6 figures and 6 table
Characterizing SARS-CoV-2 mutations in the United States
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been
mutating since it was first sequenced in early January 2020. The genetic
variants have developed into a few distinct clusters with different properties.
Since the United States (US) has the highest number of viral infected patients
globally, it is essential to understand the US SARS-CoV-2. Using genotyping,
sequence-alignment, time-evolution, -means clustering, protein-folding
stability, algebraic topology, and network theory, we reveal that the US
SARS-CoV-2 has four substrains and five top US SARS-CoV-2 mutations were first
detected in China (2 cases), Singapore (2 cases), and the United Kingdom (1
case). The next three top US SARS-CoV-2 mutations were first detected in the
US. These eight top mutations belong to two disconnected groups. The first
group consisting of 5 concurrent mutations is prevailing, while the other group
with three concurrent mutations gradually fades out. Our analysis suggests that
female immune systems are more active than those of males in responding to
SARS-CoV-2 infections. We identify that one of the top mutations,
27964CT-(S24L) on ORF8, has an unusually strong gender dependence. Based on
the analysis of all mutations on the spike protein, we further uncover that
three of four US SASR-CoV-2 substrains become more infectious. Our study calls
for effective viral control and containing strategies in the US.Comment: 31 pages, 20 figures, and 4 table
Are 2D fingerprints still valuable for drug discovery?
Recently, molecular fingerprints extracted from three-dimensional (3D)
structures using advanced mathematics, such as algebraic topology, differential
geometry, and graph theory have been paired with efficient machine learning,
especially deep learning algorithms to outperform other methods in drug
discovery applications and competitions. This raises the question of whether
classical 2D fingerprints are still valuable in computer-aided drug discovery.
This work considers 23 datasets associated with four typical problems, namely
protein-ligand binding, toxicity, solubility and partition coefficient to
assess the performance of eight 2D fingerprints. Advanced machine learning
algorithms including random forest, gradient boosted decision tree, single-task
deep neural network and multitask deep neural network are employed to construct
efficient 2D-fingerprint based models. Additionally, appropriate consensus
models are built to further enhance the performance of 2D-fingerprintbased
methods. It is demonstrated that 2D-fingerprint-based models perform as well as
the state-of-the-art 3D structure-based models for the predictions of toxicity,
solubility, partition coefficient and protein-ligand binding affinity based on
only ligand information. However, 3D structure-based models outperform 2D
fingerprint-based methods in complex-based protein-ligand binding affinity
predictions
- …