8 research outputs found
Six global and local QSPR models of aqueous solubility at pH = 7.4 based on structural similarity and physicochemical descriptors
<p>Aqueous solubility at pH = 7.4 is a very important property for medicinal chemists because this is the pH value of physiological media. The present work describes the application of three different methods (support vector machine (SVM), random forest (RF) and multiple linear regression (MLR)) and three local quantitative structure–property relationship (QSPR) models (regression corrected by nearest neighbours (RCNN), arithmetic mean property (AMP) and local regression property (LoReP)) to construct stable QSPRs with clear mechanistic interpretation. Our data set contained experimental values of aqueous solubility at pH = 7.4 of 387 chemicals (349 in the training set and 38 in the test set including 16 own measurements). The initial descriptor pool contained 210 physicochemical descriptors, calculated from the HYBOT, DRAGON, SYBYL and VolSurf+ programs. Six QSPRs with good statistics based on fundamentals of aqueous solubility and optimization of descriptor space were obtained. Those models have an RMSE close to experimental error (0.70), and are amenable to physical interpretation. The QSPR models developed in this study may be useful for medicinal chemists. Global MLR, RF and SVM models may be valuable for consideration of common factors that influence solubility. The RCNN, AMP and LoReP local models may be helpful for the optimization of aqueous solubility in small sets of related chemicals.</p
Physicochemical property profile for brain permeability: comparative study by different approaches
<p>A comparative study of classification models of brain penetration by different approaches was carried out on a training set of 1000 chemicals and drugs, and an external test set of 100 drugs. Ten approaches were applied in this work: seven medicinal chemistry approaches (including “rule of 5” and multiparameter optimization) and also three SAR techniques: logistic regression (LR), random forest (RF) and support vector machine (SVM). Forty-one different medicinal chemistry descriptors representing diverse physicochemical properties were used in this work. Medicinal chemistry approaches based on the intuitive estimation of preference zones of CNS or non-CNS chemicals, with different rules and scoring functions, yield unbalanced models with poor classification accuracy. RF and SVM methods yielded 82% and 84% classification accuracy respectively for the external test set. LR was also successful in CNS/non-CNS (denoted in this study as CNS+/CNS−) classification and yielded an overall accuracy equivalent to that of SVM and RF. At the same time, LR is especially valuable for medicinal chemists because of its simplicity and the possibility of clear mechanistic interpretation.</p