11 research outputs found
Conformator: A Novel Method for the Generation of Conformer Ensembles
Computer-aided drug design methods such as docking, pharmacophore searching, 3D database searching, and the creation of 3D-QSAR models need conformational ensembles to handle the flexibility of small molecules. Here, we present Conformator, an accurate and effective knowledge-based algorithm for generating conformer ensembles. With 99.9% of all test molecules processed, Conformator stands out by its robustness with respect to input formats, molecular geometries, and the handling of macrocycles. With an extended set of rules for sampling torsion angles, a novel algorithm for macrocycle conformer generation, and a new clustering algorithm for the assembly of conformer ensembles, Conformator reaches a median minimum root-mean-square deviation (measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers) of 0.47 Ă… with no significant difference to the highest-ranked commercial algorithm OMEGA and significantly higher accuracy than seven free algorithms, including the RDKit DG algorithm. Conformator is freely available for noncommercial use and academic research.acceptedVersio
ProteinsPlus: a web portal for structure analysis of macromolecules
With currently more than 126 000 publicly available structures and an
increasing growth rate, the Protein Data Bank constitutes a rich data source
for structure-driven research in fields like drug discovery, crop science and
biotechnology in general. Typical workflows in these areas involve manifold
computational tools for the analysis and prediction of molecular functions.
Here, we present the ProteinsPlus web server that offers a unified easy-to-use
interface to a broad range of tools for the early phase of structure-based
molecular modeling. This includes solutions for commonly required pre-
processing tasks like structure quality assessment (EDIA), hydrogen placement
(Protoss) and the search for alternative conformations (SIENA). Beyond that,
it also addresses frequent problems as the generation of 2D-interaction
diagrams (PoseView), protein–protein interface classification (HyPPI) as well
as automatic pocket detection and druggablity assessment (DoGSiteScorer). The
unified ProteinsPlus interface covering all featured approaches provides
various facilities for intuitive input and result visualization, case-specific
parameterization and download options for further processing. Moreover, its
generalized workflow allows the user a quick familiarization with the
different tools. ProteinsPlus also stores the calculated results temporarily
for future request and thus facilitates convenient result communication and
re-access. The server is freely available at http://proteins.plus
Redocking the PDB
<p>This repository contains supplementary data to the journal article 'Redocking the PDB' by Flachsenberg et al. (<a href="https://doi.org/10.1021/acs.jcim.3c01573">https://doi.org/10.1021/acs.jcim.3c01573</a>)[1]. In this paper, we described two datasets: The PDBScan22 dataset with a large set of 322,051 macromolecule–ligand binding sites generally suitable for redocking and the PDBScan22-HQ dataset with 21,355 binding sites passing different structure quality filters. These datasets were further characterized by calculating properties of the ligand (e.g., molecular weight), properties of the binding site (e.g., volume), and structure quality descriptors (e.g., crystal structure resolution). Additionally, we performed redocking experiments with our novel JAMDA structure preparation and docking workflow[1] and with AutoDock Vina[2,3]. Details for all these experiments and the dataset composition can be found in the journal article[1].</p><p>Here, we provide all the datasets, i.e., the PDBScan22 and PDBScan22-HQ datasets as well as the docking results and the additionally calculated properties (for the ligand, the binding sites, and structure quality descriptors). Furthermore, we give a detailed description of their content (i.e., the data types and a description of the column values). All datasets consist of CSV files with the actual data and associated metadata JSON files describing their content. The CSV/JSON files are compliant with the CSV on the web standard (<a href="https://csvw.org/">https://csvw.org/</a>).</p><h3>General hints</h3><ul><li>All docking experiment results consist of two CSV files, one with general information about the docking run (e.g., was it successful?) and one with individual pose results (i.e., score and RMSD to the crystal structure).</li><li>All files (except for the docking pose tables) can be indexed uniquely by the column tuple '(pdb, name)' containing the PDB code of the complex (e.g., 1gm8) and the name ligand (in the format '<HET>_<chainID>_<resID><iCode>', e.g., 'SOX_B_1559').</li><li>All files (except for the docking pose tables) have exactly the same number of rows as the dataset they were calculated on (e.g., PDBScan22 or PDBScan22-HQ). However, some CSV files may have missing values (see also the JSON metadata files) in some or even all columns (except for 'pdb' and 'name').</li><li>The docking pose tables also contain the 'pdb' and 'name' columns. However, these alone are not unique but only together with the 'rank' column (i.e., there might be multiple poses for each docking run or none).</li></ul><h3>Example usage</h3><p>Using the pandas library (<a href="https://pandas.pydata.org/">https://pandas.pydata.org/</a>) in Python, we can calculate the number of protein-ligand complexes in the PDBScan22-HQ dataset with a top-ranked pose RMSD to the crystal structure ≤ 2.0 Å in the JAMDA redocking experiment and a molecular weight between 100 Da and 200 Da:</p><blockquote><p>import pandas as pd</p><p>df = pd.read_csv('PDBScan22-HQ.csv')</p><p>df_poses = pd.read_csv('PDBScan22-HQ_JAMDA_NL_NR_poses.csv')</p><p>df_properties = pd.read_csv('PDBScan22_ligand_properties.csv')</p><p>merged = df.merge(df_properties, how='left', on=['pdb', 'name'])</p><p>merged = merged[(merged['MW'] >= 100) & (merged['MW'] <= 200)].merge(df_poses[df_poses['rank'] == 1], how='left', on=['pdb', 'name'])</p><p>nof_successful_top_ranked = (merged['rmsd_ai'] <= 2.0).sum()</p><p>nof_no_top_ranked = merged['rmsd_ai'].isna().sum()</p></blockquote><h3>Datasets</h3><ul><li>PDBScan22.csv: This is the PDBScan22 dataset[1]. This dataset was derived from the PDB[4] (PDB version March 11th, 2022). It contains macromolecule–ligand binding sites (defined by PDB code and ligand identifier) that can be read by the NAOMI library[5,6] and pass basic consistency filters.</li><li>PDBScan22-HQ.csv: This is the PDBScan22-HQ dataset[1]. It contains macromolecule–ligand binding sites from the PDBScan22 dataset that pass certain structure quality filters described in our publication[1].</li><li>PDBScan22-HQ-ADV-Success.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails.</li><li>PDBScan22-HQ-Macrocycles.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails and only contains molecules with macrocycles with at least ten atoms.</li></ul><h3>Properties for PDBScan22</h3><ul><li>PDBScan22_ligand_properties.csv: Conformation-independent properties of all ligand molecules in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6].</li><li>PDBScan22_StructureProfiler_quality_descriptors.csv: Structure quality descriptors for the binding sites in the PDBScan22 dataset calculated using the StructureProfiler tool[7].</li><li>PDBScan22_basic_complex_properties.csv: Simple properties of the binding sites in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6].</li></ul><h3>Properties for PDBScan22-HQ</h3><ul><li>PDBScan22-HQ_DoGSite3_pocket_descriptors.csv: Binding site descriptors calculated for the binding sites in the PDBScan22-HQ dataset using the DoGSite3 tool[8].</li><li>PDBScan22-HQ_molecule_types.csv: Assignment of ligands in the PDBScan22-HQ dataset (without 336 binding sites where AutoDock Vina fails) to different molecular classes (i.e., drug-like, fragment-like oligosaccharide, oligopeptide, cofactor, macrocyclic). A detailed description of the assignment can be found in our publication[1].</li></ul><h3>Docking results on PDBScan22</h3><ul><li>PDBScan22_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22 dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22 dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li></ul><h3>Docking results on PDBScan22-HQ</h3><ul><li>PDBScan22-HQ_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_NL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled.</li><li>PDBScan22-HQ_JAMDA_NL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled.</li><li>PDBScan22-HQ_JAMDA_NW_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_NW_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_NW_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled.</li><li>PDBScan22-HQ_JAMDA_NW_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled.</li><li>PDBScan22-HQ_JAMDA_WL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_NR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_WL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_WL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_WR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled.</li><li>PDBScan22-HQ_JAMDA_WL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled.</li><li>PDBScan22-HQ_AutoDockVina.csv: Docking results of AutoDock Vina[2,3] on the PDBScan22-HQ dataset. This is the general overview for the docking runs, the pose results are given in 'PDBScan22-HQ_AutoDockVina_poses.csv'. The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing.</li><li>PDBScan22-HQ_AutoDockVina_poses.csv: Pose scores and RMSDs for the docking results of AutoDock Vina[2,3] on the PDBScan22-HQ dataset. The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing.</li><li>PDBScan22-HQ-Macrocycles_AutoDockVinaMC.csv: Docking results of AutoDock Vina with macrocycle sampling[2,3] on the PDBScan22-HQ subset with macrocyclic molecules (see 'PDBScan22-HQ-Macrocycles.csv'). This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ-Macrocycles_AutoDockVinaMC_poses.csv'. The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing.</li><li>PDBScan22-HQ-Macrocycles_AutoDockVinaMC_poses.csv: Pose scores and RMSDs for the docking results of AutoDock Vina[2,3] with enabled macrocycle sampling on the PDBScan22-HQ subset with macrocyclic molecules (see 'PDBScan22-HQ-Macrocycles.csv'). The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing.</li></ul><h3>Docking with consensus scoring results on PDBScan22-HQ</h3><ul><li>PDBScan22-HQ_JAMDA_NW_NR_Consensus.csv: Docking and consensus scoring results of JAMDA[1] on the PDBScan22-HQ dataset (without the 336 binding sites where AutoDock Vina docking fails). Here, the docking was performed with JAMDA and a rescoring (with and without optimization) was performed with AutoDock Vina[2,3]. From the JAMDA pose scores and the AutoDock Vina scores, a consensus score was calculated with the rank-by-rank scheme[9]. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_Consensus_poses.csv' (AutoDock Vina scoring without optimization) and 'PDBScan22-HQ_JAMDA_NW_NR_ConsensusOpt_poses.csv' (AutoDock Vina with short numerical optimization). For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_NW_NR_Consensus_poses.csv: Pose and consensus scores and RMSDs for the docking results of JAMDA[1] and the consensus scoring on the PDBScan22-HQ dataset (without the 336 binding sites where AutoDock Vina docking fails). Here, the docking was performed with JAMDA, and a rescoring without optimization was performed with AutoDock Vina[2,3]. From the JAMDA pose score and the AutoDock Vina score, a consensus score was calculated with the rank-by-rank scheme[9]. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li><li>PDBScan22-HQ_JAMDA_NW_NR_ConsensusOpt_poses.csv: Pose and consensus scores and RMSDs for the docking results of JAMDA[1] and the consensus scoring on the PDBScan22-HQ dataset (without the 336 binding sites where AutoDock Vina docking fails). Here, the docking was performed with JAMDA, and a rescoring with short numerical optimization was performed with AutoDock Vina[2,3]. From the JAMDA pose score and the optimized AutoDock Vina score, a consensus score was calculated with the rank-by-rank scheme[9]. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled.</li></ul><h2>References</h2><ol><li>Flachsenberg, F.; Ehrt, C.; Gutermuth, T.; Rarey, M. <strong>Redocking the PDB</strong>. <i>J. Chem. Inf. Model.,</i> 2023, <a href="https://doi.org/10.1021/acs.jcim.3c01573">https://doi.org/10.1021/acs.jcim.3c01573</a></li><li>Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S.; <strong>AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings</strong>. <i>J. Chem. Inf. Model.,</i> 2021, 61, pp 3891–3898, <a href="https://doi.org/10.1021/acs.jcim.1c00203">https://doi.org/10.1021/acs.jcim.1c00203</a></li><li>Trott, O.; Olson, A. J.; <strong>AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading</strong>. <i>J. Comput. Chem.</i>, 2010, 31, pp 455-461, <a href="https://doi.org/10.1002/jcc.21334">https://doi.org/10.1002/jcc.21334</a></li><li>Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. <strong>The Protein Data Bank</strong>, <i>Nucleic Acids Res.</i>, 2000, 28, pp 235–242, <a href="https://doi.org/10.1093/nar/28.1.235">https://doi.org/10.1093/nar/28.1.235</a></li><li>Urbaczek, S.; Kolodzik, A.; Fischer, J. R.; Lippert, T.; Heuser, S.; Groth, I.; Schulz-Gasch, T.; Rarey, M. <strong>NAOMI: On the Almost Trivial Task of Reading Molecules from Different File formats</strong>. <i>J. Chem. Inf. Model.,</i> 2011, 51, pp 3199–3207, <a href="https://doi.org/10.1021/ci200324e">https://doi.org/10.1021/ci200324e</a></li><li>Urbaczek, S.; Kolodzik, A; Groth, I.; Heuser, S.; Rarey, M. <strong>Reading PDB: Perception of Molecules from 3D Atomic Coordinates</strong>. <i>J. Chem. Inf. Model.,</i> 2013, 53, 1, 76–87, <a href="https://doi.org/10.1021/ci300358c">https://doi.org/10.1021/ci300358c</a></li><li>Meyder, A.; Kampen, S.; Sieg, J.; Fährrolfes, R.; Friedrich, N.; Flachsenberg, F.; Rarey, M. <strong>StructureProfiler: an all-in-one tool for 3D protein structure profiling</strong>. <i>Bioinformatics</i>, 2019, 35, pp 874–876, <a href="https://doi.org/10.1093/bioinformatics/bty692">https://doi.org/10.1093/bioinformatics/bty692</a></li><li>Graef, J.; Ehrt, C.; Rarey, M. <strong>Binding Site Detection Remastered: Enabling Fast, Robust, and Reliable Binding Site Detection and Descriptor Calculation with DoGSite3</strong>. <i>J. Chem. Inf. Model.,</i> 2023, 63, pp 3128–3137, <a href="https://doi.org/10.1021/acs.jcim.3c00336">https://doi.org/10.1021/acs.jcim.3c00336</a></li><li>Wang, R.; Wang, S. <strong>How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment</strong>. <i>J. Chem. Inf. Comput. Sci.,</i> 2001, 41, pp 1422–1426, <a href="https://doi.org/10.1021/ci010025x">https://doi.org/10.1021/ci010025x</a></li></ol><p>C.E. is funded by Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter (Grant-ID: HIDSS-0002).</p><p>Current Address F.F.: BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany</p>
RingDecomposerLib: An Open-Source Implementation of Unique Ring Families and Other Cycle Bases
Many
cheminformatics applications like aromaticity detection, SMARTS matching,
or the calculation of atomic coordinates require a chemically meaningful
perception of the molecular ring topology. The unique ring families
(URFs) were recently introduced as a unique, polynomial, and chemically
meaningful description of the ring topology. Here we present the first
open-source implementation of the URF concept for ring perception.
The C library RingDecomposerLib is easy to use, portable, well-documented,
and thoroughly tested. Aside from the URFs, other related ring topology
descriptions like the relevant cycles (RCs), relevant cycle prototypes
(RCPs), and a smallest set of smallest rings (SSSR) can be calculated.
We demonstrate the runtime efficiency of the RingDecomposerLib with
computing time benchmarks for the complete PubChem Compound Database
and thereby show the applicability in large-scale and interactive
applications
Structural and biophysical characterization of the type VII collagen vWFA2 subdomain leads to identification of two binding sites
Type VII collagen is an extracellular matrix protein, which is important for skin stability; however, detailed information at the molecular level is scarce. The second vWFA (von Willebrand factor type A) domain of type VII collagen mediates important interactions, and immunization of mice induces skin blistering in certain strains. To understand vWFA2 function and the pathophysiological mechanisms leading to skin blistering, we structurally characterized this domain by X-ray crystallography and NMR spectroscopy. Cell adhesion assays identified two new interactions: one with beta 1 integrin via its RGD motif and one with laminin-332. The latter interaction was confirmed by surface plasmon resonance with a K-D of about 1 mm. These data show that vWFA2 has additional functions in the extracellular matrix besides interacting with type I collagen
Conformator: A Novel Method for the Generation of Conformer Ensembles
Computer-aided drug design methods such as docking, pharmacophore searching, 3D database searching, and the creation of 3D-QSAR models need conformational ensembles to handle the flexibility of small molecules. Here, we present Conformator, an accurate and effective knowledge-based algorithm for generating conformer ensembles. With 99.9% of all test molecules processed, Conformator stands out by its robustness with respect to input formats, molecular geometries, and the handling of macrocycles. With an extended set of rules for sampling torsion angles, a novel algorithm for macrocycle conformer generation, and a new clustering algorithm for the assembly of conformer ensembles, Conformator reaches a median minimum root-mean-square deviation (measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers) of 0.47 Ă… with no significant difference to the highest-ranked commercial algorithm OMEGA and significantly higher accuracy than seven free algorithms, including the RDKit DG algorithm. Conformator is freely available for noncommercial use and academic research
Placement of Water Molecules in Protein Structures: From Large-Scale Evaluations to Single-Case Examples
Water
molecules are of great importance for the correct representation
of ligand binding interactions. Throughout the last years, water molecules
and their integration into drug design strategies have received increasing
attention. Nowadays a variety of tools are available to place and
score water molecules. However, the most frequently applied software
solutions require substantial computational resources. In addition,
none of the existing methods has been rigorously evaluated on the
basis of a large number of diverse protein complexes. Therefore, we
present a novel method for placing water molecules, called WarPP,
based on interaction geometries previously derived from protein crystal
structures. Using a large, previously compiled, high-quality validation
set of almost 1500 protein–ligand complexes containing almost
20 000 crystallographically observed water molecules in their
active sites, we validated our placement strategy. We correctly placed
80% of the water molecules within 1.0 Ă… of a crystallographically
observed one
Benchmarking Commercial Conformer Ensemble Generators
We assess and compare the performance
of eight commercial conformer
ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import,
and OMEGA) and one leading free algorithm, the distance geometry algorithm
implemented in RDKit. The comparative study is based on a new version
of the Platinum Diverse Dataset, a high-quality benchmarking dataset
of 2859 protein-bound ligand conformations extracted from the PDB.
Differences in the performance of commercial algorithms are much smaller
than those observed for free algorithms in our previous study (<i>J. Chem. Inf. Model.</i> <b>2017</b>, <i>57</i>, 529–539). For commercial algorithms, the median minimum
root-mean-square deviations measured between protein-bound ligand
conformations and ensembles of a maximum of 250 conformers are between
0.46 and 0.61 Ă…. Commercial conformer ensemble generators are
characterized by their high robustness, with at least 99% of all input
molecules successfully processed and few or even no substantial geometrical
errors detectable in their output conformations. The RDKit distance
geometry algorithm (with minimization enabled) appears to be a good
free alternative since its performance is comparable to that of the
midranked commercial algorithms. Based on a statistical analysis,
we elaborate on which algorithms to use and how to parametrize them
for best performance in different application scenarios
High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators
We
developed a cheminformatics pipeline for the fully automated
selection and extraction of high-quality protein-bound ligand conformations
from X-ray structural data. The pipeline evaluates the validity and
accuracy of the 3D structures of small molecules according to multiple
criteria, including their fit to the electron density and their physicochemical
and structural properties. Using this approach, we compiled two high-quality
datasets from the Protein Data Bank (PDB): a comprehensive dataset
and a diversified subset of 4626 and 2912 structures, respectively.
The datasets were applied to benchmarking seven freely available conformer
ensemble generators: Balloon (two different algorithms), the RDKit
standard conformer ensemble generator, the Experimental-Torsion basic
Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK.
Substantial differences in the performance of the individual algorithms
were observed, with RDKit and ETKDG generally achieving a favorable
balance of accuracy, ensemble size and runtime. The Platinum datasets
are available for download from http://www.zbh.uni-hamburg.de/platinum_dataset