9 research outputs found
An Allosteric Modulator of HIVā1 Protease Shows Equipotent Inhibition of Wild-Type and Drug-Resistant Proteases
NMR
and MD simulations have demonstrated that the flaps of HIV-1 protease
(HIV-1p) adopt a range of conformations that are coupled with its
enzymatic activity. Previously, a model was created for an allosteric
site located between the flap and the core of HIV-1p, called the Eye
site (Biopolymers 2008, 89, 643ā652). Here, results from our first study were
combined with a ligand-based, lead-hopping method to identify a novel
compound (NIT). NIT inhibits HIV-1p, independent of the presence of
an active-site inhibitor such as pepstatin A. Assays showed that NIT
acts on an allosteric site other than the dimerization interface.
MD simulations of the ligandāprotein complex show that NIT
stably binds in the Eye site and restricts the flaps. That bound state
of NIT is consistent with a crystal structure of similar fragments
bound in the Eye site (Chem.
Biol. Drug Des. 2010, 75, 257ā268). Most importantly,
NIT is equally potent against wild-type and a multidrug-resistant
mutant of HIV-1p, which highlights the promise of allosteric inhibitors
circumventing existing clinical resistance
CSAR Benchmark Exercise 2011ā2012: Evaluation of Results from Docking and Relative Ranking of Blinded Congeneric Series
The Community StructureāActivity
Resource (CSAR) recently
held its first blinded exercise based on data provided by Abbott,
Vertex, and colleagues at the University of Michigan, Ann Arbor. A
total of 20 research groups submitted results for the benchmark exercise
where the goal was to compare different improvements for pose prediction,
enrichment, and relative ranking of congeneric series of compounds.
The exercise was built around blinded high-quality experimental data
from four protein targets: LpxC, Urokinase, Chk1, and Erk2. Pose prediction
proved to be the most straightforward task, and most methods were
able to successfully reproduce binding poses when the crystal structure
employed was co-crystallized with a ligand from the same chemical
series. Multiple evaluation metrics were examined, and we found that
RMSD and native contact metrics together provide a robust evaluation
of the predicted poses. It was notable that most scoring functions
underpredicted contacts between the hetero atoms (i.e., N, O, S, etc.)
of the protein and ligand. Relative ranking was found to be the most
difficult area for the methods, but many of the scoring functions
were able to properly identify Urokinase actives from the inactives
in the series. Lastly, we found that minimizing the protein and correcting
histidine tautomeric states positively trended with low RMSD for pose
prediction but minimizing the ligand negatively trended. Pregenerated
ligand conformations performed better than those that were generated
on the fly. Optimizing docking parameters and pretraining with the
native ligand had a positive effect on the docking performance as
did using restraints, substructure fitting, and shape fitting. Lastly,
for both sampling and ranking scoring functions, the use of the empirical
scoring function appeared to trend positively with the RMSD. Here,
by combining the results of many methods, we hope to provide a statistically
relevant evaluation and elucidate specific shortcomings of docking
methodology for the community
CSAR Benchmark Exercise of 2010: Combined Evaluation Across All Submitted Scoring Functions
As part of the Community Structure-Activity Resource (CSAR) center, a set of 343 high-quality, proteināligand crystal structures were assembled with experimentally determined <i>K</i><sub>d</sub> or <i>K</i><sub>i</sub> information from the literature. We encouraged the community to score the crystallographic poses of the complexes by any method of their choice. The goal of the exercise was to (1) evaluate the current ability of the field to predict activity from structure and (2) investigate the properties of the complexes and methods that appear to hinder scoring. A total of 19 different methods were submitted with numerous parameter variations for a total of 64 sets of scores from 16 participating groups. Linear regression and nonparametric tests were used to correlate scores to the experimental values. Correlation to experiment for the various methods ranged <i>R</i><sup>2</sup> = 0.58ā0.12, Spearman Ļ = 0.74ā0.37, Kendall Ļ = 0.55ā0.25, and median unsigned error = 1.00ā1.68 p<i>K</i><sub>d</sub> units. All types of scoring functionsīøforce field based, knowledge based, and empiricalīøhad examples with high and low correlation, showing no bias/advantage for any particular approach. The data across all the participants were combined to identify 63 complexes that were poorly scored across the majority of the scoring methods and 123 complexes that were scored well across the majority. The two sets were compared using a Wilcoxon rank-sum test to assess any significant difference in the distributions of >400 physicochemical properties of the ligands and the proteins. Poorly scored complexes were found to have ligands that were the same size as those in well-scored complexes, but hydrogen bonding and torsional strain were significantly different. These comparisons point to a need for CSAR to develop data sets of congeneric series with a range of hydrogen-bonding and hydrophobic characteristics and a range of rotatable bonds
CSAR Benchmark Exercise of 2010: Selection of the ProteināLigand Complexes
A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) aims to collect available data from industry and academia which may be used for this purpose (www.csardock.org). Also, CSAR is charged with organizing community-wide exercises based on the collected data. The first of these exercises was aimed to gauge the overall state of docking and scoring, using a large and diverse data set of proteināligand complexes. Participants were asked to calculate the affinity of the complexes as provided and then recalculate with changes which may improve their specific method. This first data set was selected from existing PDB entries which had binding data (<i>K</i><sub>d</sub> or <i>K</i><sub>i</sub>) in Binding MOAD, augmented with entries from PDBbind. The final data set contains 343 diverse proteināligand complexes and spans 14 p<i>K</i><sub>d</sub>. Sixteen proteins have three or more complexes in the data set, from which a user could start an inspection of congeneric series. Inherent experimental error limits the possible correlation between scores and measured affinity; <i>R</i><sup>2</sup> is limited to ā¼0.9 when fitting to the data set without over parametrizing. <i>R</i><sup>2</sup> is limited to ā¼0.8 when scoring the data set with a method trained on outside data. The details of how the data set was initially selected, and the process by which it matured to better fit the needs of the community are presented. Many groups generously participated in improving the data set, and this underscores the value of a supportive, collaborative effort in moving our field forward
CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys
A major goal in drug design is the
improvement of computational
methods for docking and scoring. The Community Structure Activity
Resource (CSAR) has collected several data sets from industry and
added in-house data sets that may be used for this purpose (www.csardock.org). CSAR has currently obtained data from Abbott, GlaxoSmithKline,
and Vertex and is working on obtaining data from several others. Combined
with our in-house projects, we are providing a data set consisting
of 6 protein targets, 647 compounds with biological affinities, and
82 crystal structures. Multiple congeneric series are available for
several targets with a few representative crystal structures of each
of the series. These series generally contain a few inactive compounds,
usually not available in the literature, to provide an upper bound
to the affinity range. The affinity ranges are typically 3ā4
orders of magnitude per series. For our in-house projects, we have
had compounds synthesized for biological testing. Affinities were
measured by Thermofluor, Octet RED, and isothermal titration calorimetry
for the most soluble. This allows the direct comparison of the biological
affinities for those compounds, providing a measure of the variance
in the experimental affinity. It appears that there can be considerable
variance in the absolute value of the affinity, making the prediction
of the absolute value ill-defined. However, the relative rankings
within the methods are much better, and this fits with the observation
that predicting relative ranking is a more tractable problem computationally.
For those in-house compounds, we also have measured the following
physical properties: logD, logP, thermodynamic solubility, and p<i>K</i><sub>a</sub>. This data set also provides a substantial
decoy set for each target consisting of diverse conformations covering
the entire active site for all of the 58 CSAR-quality crystal structures.
The CSAR data sets (CSAR-NRC HiQ and the 2012 release) provide substantial,
publically available, curated data sets for use in parametrizing and
validating docking and scoring methods
CSAR Benchmark Exercise 2013: Evaluation of Results from a Combined Computational Protein Design, Docking, and Scoring/Ranking Challenge
Community
StructureāActivity Resource (CSAR) conducted a benchmark exercise
to evaluate the current computational methods for protein design,
ligand docking, and scoring/ranking. The exercise consisted of three
phases. The first phase required the participants to identify and
rank order which designed sequences were able to bind the small molecule
digoxigenin. The second phase challenged the community to select a
near-native pose of digoxigenin from a set of decoy poses for two
of the designed proteins. The third phase investigated the ability
of current methods to rank/score the binding affinity of 10 related
steroids to one of the designed proteins (p<i>K</i><sub>d</sub> = 4.1 to 6.7). We found that 11 of 13 groups
were able to correctly select the sequence that bound digoxigenin,
with most groups providing the correct three-dimensional structure
for the backbone of the protein as well as all atoms of the active-site
residues. Eleven of the 14 groups were able to select the appropriate
pose from a set of plausible decoy poses. The ability to predict absolute
binding affinities is still a difficult task, as 8 of 14 groups were
able to correlate scores to affinity (Pearson-<i>r</i> >
0.7) of the designed protein for congeneric steroids and only 5 of
14 groups were able to correlate the ranks of the 10 related ligands
(Spearman-Ļ > 0.7)
CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma
The 2014 CSAR Benchmark
Exercise was the last community-wide exercise
that was conducted by the group at the University of Michigan, Ann
Arbor. For this event, GlaxoSmithKline (GSK) donated unpublished crystal
structures and affinity data from in-house projects. Three targets
were used: tRNA (m1G37) methyltransferase (TrmD), Spleen Tyrosine
Kinase (SYK), and Factor Xa (FXa). A particularly strong feature of
the GSK data is its large size, which lends greater statistical significance
to comparisons between different methods. In Phase 1 of the CSAR 2014
Exercise, participants were given several proteināligand complexes
and asked to identify the one near-native pose from among 200 decoys
provided by CSAR. Though decoys were requested by the community, we
found that they complicated our analysis. We could not discern whether
poor predictions were failures of the chosen method or an incompatibility
between the participantās method and the setup protocol we
used. This problem is inherent to decoys, and we strongly advise against
their use. In Phase 2, participants had to dock and rank/score a set
of small molecules given only the SMILES strings of the ligands and
a protein structure with a different ligand bound. Overall, docking
was a success for most participants, much better in Phase 2 than in
Phase 1. However, scoring was a greater challenge. No particular approach
to docking and scoring had an edge, and successful methods included
empirical, knowledge-based, machine-learning, shape-fitting, and even
those with solvation and entropy terms. Several groups were successful
in ranking TrmD and/or SYK, but ranking FXa ligands was intractable
for all participants. Methods that were able to dock well across all
submitted systems include MDock, Glide-XP, PLANTS, Wilma, Gold, SMINA, Glide-XP/PELE, FlexX, and MedusaDock. In fact, the submission based on Glide-XP/PELE cross-docked
all ligands to many crystal structures, and it was particularly impressive
to see success across an ensemble of protein structures for multiple
targets. For scoring/ranking, submissions that showed statistically
significant achievement include MDock using
ITScore, with a flexible-ligand term, SMINA using Autodock-Vina,, FlexX using HYDE, and Glide-XP using XP DockScore with and without ROCS shape similarity. Of course, these
results are for only three protein targets, and many more systems
need to be investigated to truly identify which approaches are more
successful than others. Furthermore, our exercise is not a competition
CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma
The 2014 CSAR Benchmark
Exercise was the last community-wide exercise
that was conducted by the group at the University of Michigan, Ann
Arbor. For this event, GlaxoSmithKline (GSK) donated unpublished crystal
structures and affinity data from in-house projects. Three targets
were used: tRNA (m1G37) methyltransferase (TrmD), Spleen Tyrosine
Kinase (SYK), and Factor Xa (FXa). A particularly strong feature of
the GSK data is its large size, which lends greater statistical significance
to comparisons between different methods. In Phase 1 of the CSAR 2014
Exercise, participants were given several proteināligand complexes
and asked to identify the one near-native pose from among 200 decoys
provided by CSAR. Though decoys were requested by the community, we
found that they complicated our analysis. We could not discern whether
poor predictions were failures of the chosen method or an incompatibility
between the participantās method and the setup protocol we
used. This problem is inherent to decoys, and we strongly advise against
their use. In Phase 2, participants had to dock and rank/score a set
of small molecules given only the SMILES strings of the ligands and
a protein structure with a different ligand bound. Overall, docking
was a success for most participants, much better in Phase 2 than in
Phase 1. However, scoring was a greater challenge. No particular approach
to docking and scoring had an edge, and successful methods included
empirical, knowledge-based, machine-learning, shape-fitting, and even
those with solvation and entropy terms. Several groups were successful
in ranking TrmD and/or SYK, but ranking FXa ligands was intractable
for all participants. Methods that were able to dock well across all
submitted systems include MDock, Glide-XP, PLANTS, Wilma, Gold, SMINA, Glide-XP/PELE, FlexX, and MedusaDock. In fact, the submission based on Glide-XP/PELE cross-docked
all ligands to many crystal structures, and it was particularly impressive
to see success across an ensemble of protein structures for multiple
targets. For scoring/ranking, submissions that showed statistically
significant achievement include MDock using
ITScore, with a flexible-ligand term, SMINA using Autodock-Vina,, FlexX using HYDE, and Glide-XP using XP DockScore with and without ROCS shape similarity. Of course, these
results are for only three protein targets, and many more systems
need to be investigated to truly identify which approaches are more
successful than others. Furthermore, our exercise is not a competition
CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma
The 2014 CSAR Benchmark
Exercise was the last community-wide exercise
that was conducted by the group at the University of Michigan, Ann
Arbor. For this event, GlaxoSmithKline (GSK) donated unpublished crystal
structures and affinity data from in-house projects. Three targets
were used: tRNA (m1G37) methyltransferase (TrmD), Spleen Tyrosine
Kinase (SYK), and Factor Xa (FXa). A particularly strong feature of
the GSK data is its large size, which lends greater statistical significance
to comparisons between different methods. In Phase 1 of the CSAR 2014
Exercise, participants were given several proteināligand complexes
and asked to identify the one near-native pose from among 200 decoys
provided by CSAR. Though decoys were requested by the community, we
found that they complicated our analysis. We could not discern whether
poor predictions were failures of the chosen method or an incompatibility
between the participantās method and the setup protocol we
used. This problem is inherent to decoys, and we strongly advise against
their use. In Phase 2, participants had to dock and rank/score a set
of small molecules given only the SMILES strings of the ligands and
a protein structure with a different ligand bound. Overall, docking
was a success for most participants, much better in Phase 2 than in
Phase 1. However, scoring was a greater challenge. No particular approach
to docking and scoring had an edge, and successful methods included
empirical, knowledge-based, machine-learning, shape-fitting, and even
those with solvation and entropy terms. Several groups were successful
in ranking TrmD and/or SYK, but ranking FXa ligands was intractable
for all participants. Methods that were able to dock well across all
submitted systems include MDock, Glide-XP, PLANTS, Wilma, Gold, SMINA, Glide-XP/PELE, FlexX, and MedusaDock. In fact, the submission based on Glide-XP/PELE cross-docked
all ligands to many crystal structures, and it was particularly impressive
to see success across an ensemble of protein structures for multiple
targets. For scoring/ranking, submissions that showed statistically
significant achievement include MDock using
ITScore, with a flexible-ligand term, SMINA using Autodock-Vina,, FlexX using HYDE, and Glide-XP using XP DockScore with and without ROCS shape similarity. Of course, these
results are for only three protein targets, and many more systems
need to be investigated to truly identify which approaches are more
successful than others. Furthermore, our exercise is not a competition