Solute descriptors have been widely
used to model chemical transfer
processes through poly-parameter linear free energy relationships
(pp-LFERs); however, there are still substantial difficulties in obtaining
these descriptors accurately and quickly for new organic chemicals.
In this research, models (PaDEL-DNN) that require only SMILES of chemicals
were built to satisfactorily estimate pp-LFER descriptors using deep
neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated
pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water
partitioning coefficient (log Kstorage‑lipid/water), bioconcentration factor (BCF), aqueous solubility (ESOL), and
hydration free energy (freesolve). Then, assuming that the accuracy
in the estimated values of widely available properties, e.g., logP
(octanol–water partition coefficient), can calibrate estimates
for less available but related properties, we proposed logP as a surrogate
metric for evaluating the overall accuracy of the estimated pp-LFER
descriptors. When using the pp-LFER descriptors to model log Kstorage‑lipid/water, BCF, ESOL,
and freesolve, we achieved around 0.1 log unit lower errors
for chemicals whose estimated pp-LFER descriptors were deemed “accurate”
by the surrogate metric. The interpretation of the PaDEL-DNN models
revealed that, for a given test chemical, having several (around 5)
“similar” chemicals in the training data set was crucial
for accurate estimation while the remaining less similar training
chemicals provided reasonable baseline estimates. Lastly, pp-LFER
descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals
were reasonably estimated by combining PaDEL-DNN with the surrogate
metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated
descriptors will greatly benefit chemical transfer modeling