High-throughput synthesis provides data for predicting molecular properties and reaction success

Abstract

Data and code to accompany the publication. Data S1 through S3 are described in the supplementary materials. The virtual library is contained in virtual_library.tar, a tar-archive containing bzip2-compressed CSV files each holding a chunk of 10,000 records for a total of 17,482,092 records. Each record has a unique identifier "mol_number". For each chunk, two files are provided: VL_chunk_xxxx_smiles.csv contains only the identifier and the respective SMILES string. The second file, VL_chunk_xxxx.csv additionally contains the predictions made for the library members. In addition to the identifier and SMILES string, the columns of VL_chunk_xxxx.csv are: - MoKa calculations: [number_of_ionizable_centers, center1_acidorbase, center1_pKa, center1_atom_number, center1_prediction_quality, center2_acidorbase, center2_pKa, center2_atom_number, center2_prediction_quality, center3_acidorbase, center3_pKa, center3_atom_number, center3_prediction_quality, center4_acidorbase, center4_pKa, center4_atom_number, center4_prediction_quality, center5_acidorbase, center5_pKa, center5_atom_number, center5_prediction_quality, center6_acidorbase, center6_pKa, center6_atom_number, center6_prediction_quality, center7_acidorbase, center7_pKa, center7_atom_number, center7_prediction_quality, center8_acidorbase, center8_pKa, center8_atom_number] - Property predictions using Novartis' model: [predicted_logD_pH7.4, predicted_logSolubility_pH6.8_(mM), predicted_ionization_constant] - Property predictions using Schrödinger: [QPlogPo/w, QPlogS]. These are calculated for the all-cis diastereomer. - Reaction outcome predictions for up to two possible reactions leading to the product: [rxn1_smiles, rxn1_predictions, rxn1_confidence, rxn2_smiles, rxn2_predictions, rxn2_confidence

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 22/09/2023