3 research outputs found

    CheckMyBlob evaluation data set (CL)

    No full text
    A data set of ligands used to evaluate the CheckMyBlob method, described in the Kowiel et al. paper "Automatic recognition of ligands in electron density by machine learning methods". This data set repeats the setup used in the study of Carolan & Lamzin titled "Automated identification of crystallographic ligands using sparse-density representations". It consists of ligands from X-ray diffraction experiments with 1.0–2.5 Å resolution. Adjacent PDB ligands were not connected. Ligands were labeled according to the PDB naming convention. The data set was limited to the 82 ligand types listed by Carolan & Lamzin. The resulting data set consists of 121,360 examples with ligand counts ranging from 42,622 examples for SO4 to 16 for SPO (spheroidene). For machine learning (classification) purposes, the target attribute is: res_name

    CheckMyBlob evaluation data set (TAMC)

    No full text
    A data set of ligands used to evaluate the CheckMyBlob method, described in the Kowiel et al. paper "Automatic recognition of ligands in electron density by machine learning methods". This data set attempts to repeat the experimental setup from Terwilliger et al. described in "Ligand identification using electron-density map correlations". It consists of ligands from X-ray diffraction experiments with 6–150 non-H atoms. Connected PDB ligands were labeled as single alphabetically ordered strings of hetero-compound codes, whereas unknown species, water molecules, standard amino acids, and nucleotides were excluded. Finally, the data set was limited to 200 most popular ligands. The resulting data set consisted of 161,758 examples with individual ligand counts ranging from 36,535 examples for GOL (glycerol) to 114 for LMG (1,2-distearoyl-monogalactosyl-diglyceride). For machine learning (classification) purposes, the target attribute is: res_name

    CheckMyBlob ligand data set (CMB)

    No full text
    <p>Ligand data set prepared for the CheckMyBlob study, described in <em>"Automatic recognition of ligands in electron density by machine learning methods"</em> by Kowiel, M. <em>et al.</em> It contains only structures from X-ray diffraction experiments determined to at least 4.0 Å resolution. Entries with R factor above 0.3 or ligands below 0.3 occupancy (according to wwPDB validation reports) were rejected. Only ligands with at least 2 non-H atoms were considered and structures with low ligand map correlation coefficients (RSCC < 0.6, RSZO <= 1, RSZD > 6.0) were removed. Apart from taking into account quality factors, we removed from the experimental data set all moieties that are not considered proper ligands. These included: unknown species, water molecules, standard amino acids, and selected nucleotides. Moreover, connected ligands (as per the naming convention in the PDB) were labeled as alphabetically ordered strings of hetero-compound codes (e.g., NAG-NAG-NAG-NAG). Finally, the data set was limited to 200 most popular ligands. The resulting data set consisted of 219,986 examples with individual ligand counts ranging from 48,490 examples for SO4 (sulfate ion) to 106 for A2G (n-acetyl-2-deoxy-2-amino-galactose). More details concerning data selection can be found in the paper of Kowiel <em>et al.</em></p> <p>For machine learning (classification) purposes, the target attribute is: <strong>res_name</strong>.</p
    corecore