Estimation of aqueous solubility of organic compounds.

Abstract

The relationship between aqueous activity coefficients (log γ(w)) and different physico-chemical properties has been studied for a number of solutes by both empirical correlations as well as by applying existing theoretical models. The solute properties selected have been classified into three categories: geometrical, polar, and electrostatic. The solutes chosen were divided into two major groups: (a) Training Set. Structurally simple compounds, i.e., each containing only one functional group, and (b) Test Set. A series of drugs and pollutants covering a wide variety of functional groups. The Training Set is in turn formed by four sub-sets of structurally related solutes, each representative of typical data sets used in the literature for solubility studies. Linear relationships were found for polar and geometric parameters in agreement with those reported in the literature. However, although the overall correlations are good, the quality of the regressions among the sub-sets is not uniform. The generality of the relationships obtained with the Training Set was tested by applying the obtained expressions to estimate log γ(w) of the solutes of the Test Set. It was found that the parameters of the theoretical models are the only ones whose relationship with log γ(w) is maintained for both the Training and the Test sets. The theoretical models used are: octanol-water partition coefficient estimated by both Rekker's (parameter LOGP) and by Leo's (parameter PCLOGP) methods; the solubility group contributions method of Wakita et al. (1986) (parameter WAKITA); the Linear Solvation Energy Relationships model (parameter KAMLET), and the UNIFAC model. The theoretical approaches were evaluated based on two criteria: accuracy of predictions and range of applicability. The accuracy of predictions was quantitated by a prediction coefficient, P², which although analogous to regression coefficient (R²) is far less flexible. Prediction coefficient is sensitive not only to scatter of the predictions but also to the systematic errors of the model being tested. The range of applicability was quantitated by the fraction (f) of solutes within the data set for which estimates by the given methodology are possible. The Accuracy-Generality Product (AGP) defined as the product of P² and f was used as the overall criterion for evaluation. The results indicated that the quality of predictions of the theoretical models as determined by the AGP is PCLOGP > LOGP > WAKITA > UNIFAC > KAMLET, for both the Training and Test sets

    Similar works