Search CORE

14 research outputs found

Selecting an Optimal Number of Binding Site Waters To Improve Virtual Screening Enrichments Against the Adenosine A2A Receptor

Author: Adriaan P. IJzerman (195167)
Eelke B. Lenselink (1788733)
Herman W. T. van Vlijmen (195170)
Thijs Beuming (1649725)
Woody Sherman (495156)
Publication venue
Publication date
Field of study

A major challenge in structure-based virtual screening (VS) involves the treatment of explicit water molecules during docking in order to improve the enrichment of active compounds over decoys. Here we have investigated this in the context of the adenosine A2A receptor, where water molecules have previously been shown to be important for achieving high enrichment rates with docking, and where the positions of some binding site waters are known from a high-resolution crystal structure. The effect of these waters (both their presence and orientations) on VS enrichment was assessed using a carefully curated set of 299 high affinity A2A antagonists and 17,337 decoys. We show that including certain crystal waters greatly improves VS enrichment and that optimization of water hydrogen positions is needed in order to achieve the best results. We also show that waters derived from a molecular dynamics simulation  without any knowledge of crystallographic waters  can improve enrichments to a similar degree as the crystallographic waters, which makes this strategy applicable to structures without experimental knowledge of water positions. Finally, we used decision trees to select an ensemble of structures with different water molecule positions and orientations that outperforms any single structure with water molecules. The approach presented here is validated against independent test sets of A2A receptor antagonists and decoys from the literature. In general, this water optimization strategy could be applied to any target with waters-mediated protein–ligand interactions

The Francis Crick Institute

Interacting with GPCRs: Using Interaction Fingerprints for Virtual Screening

Author: Adriaan P. IJzerman (195167)
Eelke B. Lenselink (1788733)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Willem Jespers (3138927)
Publication venue
Publication date
Field of study

The expanding number of crystal structures of G protein-coupled receptors (GPCRs) has increased the knowledge on receptor function and their ability to recognize ligands. Although structure-based virtual screening has been quite successful on GPCRs, scores obtained by docking are typically not indicative for ligand affinity. Methods capturing interactions between protein and ligand in a more explicit manner, such as interaction fingerprints (IFPs), have been applied as an addition or alternative to docking. Originally IFPs captured the interactions of amino acid residues with ligands with specific definitions for the various interaction types. More complex IFPs now capture atom–atom interactions, such as in SYBYL, or fragment–fragment co-occurrences such as in SPLIF. Overall, most of the IFPs have been studied in comparison with docking in retrospective studies. For GPCRs it remains unclear which IFP should be used, if at all, and in what manner. Thus, the performance between five different IFPs was compared on five different representative GPCRs, including several extensions of the original implementations,. Results show that the more detailed IFPs, SYBYL and SPLIF, perform better than the other IFPs (Deng, Credo, and Elements). SPLIF was further tuned based on the number of poses, fingerprint similarity coefficient, and using an ensemble of structures. Enrichments were obtained that were significantly higher than initial enrichments and those obtained by 2D-similarity. With the increase in available crystal structures for GPCRs, and given that IFPs such as SPLIF enhance enrichment in virtual screens, it is anticipated that IFPs will be used in conjunction with docking, especially for GPCRs with a large binding pocket

The Francis Crick Institute

Novel resistance conferring mutations derived from the dataset (NRTI).

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

The value in the different drug columns indicates the average Log FC in the presence of this mutation, when not available in the dataset the value is denoted ‘n/a’. Mutations indicated with an asterisk were incompletely tested on all drugs in the dataset. Like the NNRTI resistance mutations, each mutation displays a different resistance profile over all drugs. AZT is seen to be the most susceptible (average Log FC 0.69) and TDF the least susceptible (average Log FC 0.27).</p

The Francis Crick Institute

Performance in validation on isolates not present in the original dataset.

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

Validation parameters were calculated using different forms of grouping to give an unbiased error estimate. Class wide values are indicated in italic and the global average performance is indicated in bold and italic. For larger groups (RefID, SeqID, Isolatename and per drug) the average value and standard deviation are given. For three drugs (RTV, DLV, DDC) no Virco cut-off was available, here the Stanford cut off was used for both, for SQV no Stanford cut-off was available so the Virco cut-off was used for both. The table shows that our PCM models perform robustly in predicting the Log FC as indicated by the regression validation parameters RMSE and R02. More importantly, the correctly classified percentage is 84% overall.</p

The Francis Crick Institute

Description of the dataset used in the current study (Obtained from Virco).

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

*For Reverse Transcriptase only the first 400 amino acids were sequenced. The total size of the dataset is unlike any other dataset used in PCM. The number of mutations shown in the last column is the average per sequence and standard deviation when compared to HXB2.</p

The Francis Crick Institute

The model performance in the LOSO experiments.

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

(A) The figure visualizes the measured Log FC for a mutant – drug pair on the x-axis. The y-axis shows the Log FC predicted for that mutant – drug pair by a model that was trained without that particular pair. Again the PIs perform the best (RMSE 0.40 log units, R02 0.76, and CCP 90%) followed by the NNRTIs (RMSE 0.67 log units, R02 0.53 and CCP 84%) and then the NRTIs (RMSE 0.45 log units, R02 0.50 and CCP 74%). (B) The density to the training set as a measure of applicability domain provides a useful estimate to predict model reliability. The x-axis shows fraction of the training set that has a similarity of 0.97 or higher to a specific mutant – drug pair. If this fraction is larger, then the prediction error (y-axis) for that pair becomes smaller as the model is better able to extrapolate from the training set. Since this fraction can be calculated before any model prediction is made, a maximally allowed prediction error can be predetermined before any model predictions are made.</p

The Francis Crick Institute

Model performance predicting the Stanford University dataset.

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

(A) The isolates predicted were not included in the training set, still performance is robust. Based on CCP, the NNRTIs perform the best (RMSE 0.62 log units and CCP 90%), followed by the PIs (RMSE 0.43 log units and CCP 85%) and then the NRTIs (RMSE 0.61 and CCP 79%). (B) The density to the training set as a measure of applicability domain provides a useful estimate to predict model reliability. The x-axis shows fraction of the training set that has a similarity of 0.97 or higher to a specific mutant – drug pair. The larger this fraction, the smaller the prediction error (y-axis) for that pair as the model is better able to extrapolate from the training set.</p

The Francis Crick Institute

Novel resistance conferring mutations derived from the dataset (NNRTI).

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

The value in the different drug columns indicates the average Log FC in the presence of this mutation. While these mutations have been selected to confer some resistance to all NNRTIs, each drug still has a distinct profile. Efavirenz is the most sensitive (average Log FC 0.79) and Etravirine the least (average Log FC 0.44) with Nevirapine (average Log FC 0.55) and Delavirdine (average Log FC 0.58) in between.</p

The Francis Crick Institute

Model interpretation, mutations leading to PI specific resistance.

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

Shown are the 30 mutations that have the most diverse effect over the different members of the PI drug class. The figure contains a number of known mutations (e.g. M46L, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a> A71T, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a> V82A, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a> V82S <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002899#pcbi.1002899-Johnson1" target="_blank">[6]</a>) but also several novel mutations (e.g. G48W, N88G). Values in the cells represent Log FC.</p

The Francis Crick Institute

Model validation.

Author: Adriaan P. IJzerman (195167)
Alwin Hendriks (381073)
Andreas Bender (192334)
Gerard J. P. van Westen (195154)
Herman W. T. van Vlijmen (195170)
Jörg K. Wegner (195157)
Publication venue
Publication date
Field of study

(A,B,C) Our models perform robustly in both internal validation (unknown combinations of known drugs and known mutants) and (D,E,F) external validation (unknown combinations of drugs and mutants, one of which is unknown). The PIs perform the best (RMSE 0.27 log units, CCP 93% internal and 0.43 log units, CCP 90% external), followed by the NNRTIs (RMSE 0.45 log units, and CCP 93% internal and 0.49 log units, CCP 91% external) and then the NRTIs (RMSE 0.31 log units, CCP 80% internal and 0.52 log units, CCP 68% external). The range of Log FC values present in the dataset is the largest for the NNRTIs, followed by the PIs and then the NRTIs.</p

The Francis Crick Institute