High-throughput synthesis provides data for predicting molecular properties and reaction success

André, Jérôme; Bode, Jeffrey W.; Brocklehurst, Cara E.; Gosling, Daniel J.; Götz, Julian; Jackl, Moritz K.; Jindakun, Chalupat; Luneau, Alexandre; Marziale, Alexander N.; Palmieri, Marco; Reck, Marcel; Springer, Clayton

High-throughput synthesis provides data for predicting molecular properties and reaction success

Authors: Jérôme André
Jeffrey W. Bode
Cara E. Brocklehurst
Daniel J. Gosling
Julian Götz
Moritz K. Jackl
Chalupat Jindakun
Alexandre Luneau
Alexander N. Marziale
Marco Palmieri
Marcel Reck
Clayton Springer
Publication date: 21 September 2023
Publisher
Doi

Abstract

Data and code to accompany the publication. Data S1 through S3 are described in the supplementary materials. The virtual library is contained in virtual_library.tar, a tar-archive containing bzip2-compressed CSV files each holding a chunk of 10,000 records for a total of 17,482,092 records. Each record has a unique identifier "mol_number". For each chunk, two files are provided: VL_chunk_xxxx_smiles.csv contains only the identifier and the respective SMILES string. The second file, VL_chunk_xxxx.csv additionally contains the predictions made for the library members. In addition to the identifier and SMILES string, the columns of VL_chunk_xxxx.csv are: - MoKa calculations: [number_of_ionizable_centers, center1_acidorbase, center1_pKa, center1_atom_number, center1_prediction_quality, center2_acidorbase, center2_pKa, center2_atom_number, center2_prediction_quality, center3_acidorbase, center3_pKa, center3_atom_number, center3_prediction_quality, center4_acidorbase, center4_pKa, center4_atom_number, center4_prediction_quality, center5_acidorbase, center5_pKa, center5_atom_number, center5_prediction_quality, center6_acidorbase, center6_pKa, center6_atom_number, center6_prediction_quality, center7_acidorbase, center7_pKa, center7_atom_number, center7_prediction_quality, center8_acidorbase, center8_pKa, center8_atom_number] - Property predictions using Novartis' model: [predicted_logD_pH7.4, predicted_logSolubility_pH6.8_(mM), predicted_ionization_constant] - Property predictions using Schrödinger: [QPlogPo/w, QPlogS]. These are calculated for the all-cis diastereomer. - Reaction outcome predictions for up to two possible reactions leading to the product: [rxn1_smiles, rxn1_predictions, rxn1_confidence, rxn2_smiles, rxn2_predictions, rxn2_confidence

Similar works

Full text

Available Versions

ZENODO

oai:zenodo.org:8366088

Last time updated on 22/09/2023