14 research outputs found
Selecting an Optimal Number of Binding Site Waters To Improve Virtual Screening Enrichments Against the Adenosine A<sub>2A</sub> Receptor
A major
challenge in structure-based virtual screening (VS) involves
the treatment of explicit water molecules during docking in order
to improve the enrichment of active compounds over decoys. Here we
have investigated this in the context of the adenosine A<sub>2A</sub> receptor, where water molecules have previously been shown to be
important for achieving high enrichment rates with docking, and where
the positions of some binding site waters are known from a high-resolution
crystal structure. The effect of these waters (both their presence
and orientations) on VS enrichment was assessed using a carefully
curated set of 299 high affinity A<sub>2A</sub> antagonists and 17,337
decoys. We show that including certain crystal waters greatly improves
VS enrichment and that optimization of water hydrogen positions is
needed in order to achieve the best results. We also show that waters
derived from a molecular dynamics simulation î—¸ without any
knowledge of crystallographic waters î—¸ can improve enrichments
to a similar degree as the crystallographic waters, which makes this
strategy applicable to structures without experimental knowledge of
water positions. Finally, we used decision trees to select an ensemble
of structures with different water molecule positions and orientations
that outperforms any single structure with water molecules. The approach
presented here is validated against independent test sets of A<sub>2A</sub> receptor antagonists and decoys from the literature. In
general, this water optimization strategy could be applied to any
target with waters-mediated protein–ligand interactions
Interacting with GPCRs: Using Interaction Fingerprints for Virtual Screening
The expanding number
of crystal structures of G protein-coupled
receptors (GPCRs) has increased the knowledge on receptor function
and their ability to recognize ligands. Although structure-based virtual
screening has been quite successful on GPCRs, scores obtained by docking
are typically not indicative for ligand affinity. Methods capturing
interactions between protein and ligand in a more explicit manner,
such as interaction fingerprints (IFPs), have been applied as an addition
or alternative to docking. Originally IFPs captured the interactions
of amino acid residues with ligands with specific definitions for
the various interaction types. More complex IFPs now capture atom–atom
interactions, such as in SYBYL, or fragment–fragment co-occurrences
such as in SPLIF. Overall, most of the IFPs have been studied in comparison
with docking in retrospective studies. For GPCRs it remains unclear
which IFP should be used, if at all, and in what manner. Thus, the
performance between five different IFPs was compared on five different
representative GPCRs, including several extensions of the original
implementations,. Results show that the more detailed IFPs, SYBYL
and SPLIF, perform better than the other IFPs (Deng, Credo, and Elements).
SPLIF was further tuned based on the number of poses, fingerprint
similarity coefficient, and using an ensemble of structures. Enrichments
were obtained that were significantly higher than initial enrichments
and those obtained by 2D-similarity. With the increase in available
crystal structures for GPCRs, and given that IFPs such as SPLIF enhance
enrichment in virtual screens, it is anticipated that IFPs will be
used in conjunction with docking, especially for GPCRs with a large
binding pocket
Novel resistance conferring mutations derived from the dataset (NRTI).
<p>The value in the different drug columns indicates the average Log FC in the presence of this mutation, when not available in the dataset the value is denoted ‘n/a’. Mutations indicated with an asterisk were incompletely tested on all drugs in the dataset. Like the NNRTI resistance mutations, each mutation displays a different resistance profile over all drugs. AZT is seen to be the most susceptible (average Log FC 0.69) and TDF the least susceptible (average Log FC 0.27).</p
Performance in validation on isolates not present in the original dataset.
<p>Validation parameters were calculated using different forms of grouping to give an unbiased error estimate. Class wide values are indicated in italic and the global average performance is indicated in bold and italic. For larger groups (RefID, SeqID, Isolatename and per drug) the average value and standard deviation are given. For three drugs (RTV, DLV, DDC) no Virco cut-off was available, here the Stanford cut off was used for both, for SQV no Stanford cut-off was available so the Virco cut-off was used for both. The table shows that our PCM models perform robustly in predicting the Log FC as indicated by the regression validation parameters RMSE and R<sub>0</sub><sup>2</sup>. More importantly, the correctly classified percentage is 84% overall.</p
Description of the dataset used in the current study (Obtained from Virco).
*<p>For Reverse Transcriptase only the first 400 amino acids were sequenced. The total size of the dataset is unlike any other dataset used in PCM. The number of mutations shown in the last column is the average per sequence and standard deviation when compared to HXB2.</p
The model performance in the LOSO experiments.
<p>(A) The figure visualizes the measured Log FC for a mutant – drug pair on the x-axis. The y-axis shows the Log FC predicted for that mutant – drug pair by a model that was trained without that particular pair. Again the PIs perform the best (RMSE 0.40 log units, R<sub>0</sub><sup>2</sup> 0.76, and CCP 90%) followed by the NNRTIs (RMSE 0.67 log units, R<sub>0</sub><sup>2</sup> 0.53 and CCP 84%) and then the NRTIs (RMSE 0.45 log units, R<sub>0</sub><sup>2</sup> 0.50 and CCP 74%). (B) The density to the training set as a measure of applicability domain provides a useful estimate to predict model reliability. The x-axis shows fraction of the training set that has a similarity of 0.97 or higher to a specific mutant – drug pair. If this fraction is larger, then the prediction error (y-axis) for that pair becomes smaller as the model is better able to extrapolate from the training set. Since this fraction can be calculated before any model prediction is made, a maximally allowed prediction error can be predetermined before any model predictions are made.</p
Model performance predicting the Stanford University dataset.
<p>(A) The isolates predicted were not included in the training set, still performance is robust. Based on CCP, the NNRTIs perform the best (RMSE 0.62 log units and CCP 90%), followed by the PIs (RMSE 0.43 log units and CCP 85%) and then the NRTIs (RMSE 0.61 and CCP 79%). (B) The density to the training set as a measure of applicability domain provides a useful estimate to predict model reliability. The x-axis shows fraction of the training set that has a similarity of 0.97 or higher to a specific mutant – drug pair. The larger this fraction, the smaller the prediction error (y-axis) for that pair as the model is better able to extrapolate from the training set.</p
Novel resistance conferring mutations derived from the dataset (NNRTI).
<p>The value in the different drug columns indicates the average Log FC in the presence of this mutation. While these mutations have been selected to confer some resistance to all NNRTIs, each drug still has a distinct profile. Efavirenz is the most sensitive (average Log FC 0.79) and Etravirine the least (average Log FC 0.44) with Nevirapine (average Log FC 0.55) and Delavirdine (average Log FC 0.58) in between.</p
Model interpretation, mutations leading to PI specific resistance.
<p>Shown are the 30 mutations that have the most diverse effect over the different members of the PI drug class. The figure contains a number of known mutations (e.g. M46L, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a> A71T, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a> V82A, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a> V82S <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a>) but also several novel mutations (e.g. G48W, N88G). Values in the cells represent Log FC.</p
Model validation.
<p>(A,B,C) Our models perform robustly in both internal validation (unknown combinations of known drugs and known mutants) and (D,E,F) external validation (unknown combinations of drugs and mutants, one of which is unknown). The PIs perform the best (RMSE 0.27 log units, CCP 93% internal and 0.43 log units, CCP 90% external), followed by the NNRTIs (RMSE 0.45 log units, and CCP 93% internal and 0.49 log units, CCP 91% external) and then the NRTIs (RMSE 0.31 log units, CCP 80% internal and 0.52 log units, CCP 68% external). The range of Log FC values present in the dataset is the largest for the NNRTIs, followed by the PIs and then the NRTIs.</p