213 research outputs found

    Mechanistically transparent models for predicting aqueous soluÂŹbility of rigid, slightly flexible, and very flexible drugs (MW<2000) Accuracy near that of random forest regression Alex Avdeef

    Get PDF
    Yalkowsky’s General Solubility Equation (GSE), with its three fixed constants, is popular and easy to apply, but is not very accurate for polar, zwitterionic, or flexible molecules. This review examines the findings of a series of studies, where we have sought to come up with a better prediction model, by comparing the performances of the GSE to Abraham’s Solvation Equation (ABSOLV), and Random Forest regression (RFR) machine-learning (ML) method. Large, well-curated aqueous intrinsic solubility databases are available. However, drugs may be sparsely distributed in chemical space, concentrated in clusters. Even a large database might overlook some regions. Test compounds from under-represented portions of space may be poorly predicted, as might be the case with the ‘loose’ set of 32 drugs in the Second Solubility Challenge (2020). There appears to be still a need for better coverage of drug space. Increasingly, current trends in predictions of solubility use calculated input descriptors, which may be an advantage for exploring properties of molecules yet to be synthesized. The risk may be that overall prediction approaches might be based on accumulated uncertainty. The increasing use of ML/AI methods can lead to accurate predictions, but such predictions may not readily suggest the strategies to pursue in selecting yet-to-be-synthesized compounds. Based on our latest findings, we recommend predictions based on both ‘grouped’ ABSOLV(GRP) and ‘Flexible Acceptor’ GSE(Ω,B) models with the provided best-fit parameters, where Ω is the Kier molecular flexibility index and B is the Abraham H-bond acceptor strength. For molecules with Ω < 11, the prudent choice is to pick the Consensus Model, the average of ABSOLV(GRP) and GSE(Ω,B). For more flexible molecules, GSE(Ω,B) is recommended

    Can small drugs predict the intrinsic aqueous solubility of ‘beyond Rule of 5’ big drugs?

    Get PDF
    The aim of the study was to explore to what extent small molecules (mostly from the Rule of 5 chemical space) can be used to predict the intrinsic aqueous solubility, S0, of big molecules from beyond the Rule of 5 (bRo5) space. It was demonstrated that the General Solubility Equation (GSE) and the Abraham Solvation Equation (ABSOLV) underpredict solubility in systematic but slightly ways. The Random Forest regression (RFR) method predicts solubility more accurately, albeit in the manner of a ‘black box.’ It was discovered that the GSE improves considerably in the case of big molecules when the coefficient of the log P term (octanol-water partition coefficient) in the equation is set to -0.4 instead of the traditional -1 value. The traditional GSE underpredicts solubility for molecules with experimental S0 < 50 ”M. In contrast, the ABSOLV equation (trained with small molecules) underpredicts the solubility of big molecules in all cases tested. It was found that the errors in the ABSOLV-predicted solubilities of big molecules correlate linearly with the number of rotatable bonds, which suggests that flexibility may be an important factor in differentiating solubility of small from big molecules. Notably, most of the 31 big molecules considered have negative enthalpy of solution: these big molecules become less soluble with increasing temperature, which is compatible with ‘molecular chameleon’ behavior associated with intramolecular hydrogen bonding. The X‑ray structures of many of these molecules reveal void spaces in their crystal lattices large enough to accommodate many water molecules when such solids are in contact with aqueous media. The water sorbed into crystals suspended in aqueous solution may enhance solubility by way of intra-lattice solute-water interactions involving the numerous H‑bond acceptors in the big molecules studied. A ‘Solubility Enhancement–Big Molecules’ index was defined, which embodies many of the above findings.</p

    Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database

    Get PDF
    The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S0) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky’s general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what’s missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data

    Data mining methods for the prediction of intestinal absorption using QSAR

    Get PDF
    Oral administration is the most common route for administration of drugs. With the growing cost of drug discovery, the development of Quantitative Structure-Activity Relationships (QSAR) as computational methods to predict oral absorption is highly desirable for cost effective reasons. The aim of this research was to develop QSAR models that are highly accurate and interpretable for the prediction of oral absorption. In this investigation the problems addressed were datasets with unbalanced class distributions, feature selection and the effects of solubility and permeability towards oral absorption prediction. Firstly, oral absorption models were obtained by overcoming the problem of unbalanced class distributions in datasets using two techniques, under-sampling of compounds belonging to the majority class and the use of different misclassification costs for different types of misclassifications. Using these methods, models with higher accuracy were produced using regression and linear/non-linear classification techniques. Secondly, the use of several pre-processing feature selection methods in tandem with decision tree classification analysis – including misclassification costs – were found to produce models with better interpretability and higher predictive accuracy. These methods were successful to select the most important molecular descriptors and to overcome the problem of unbalanced classes. Thirdly, the roles of solubility and permeability in oral absorption were also investigated. This involved expansion of oral absorption datasets and collection of in vitro and aqueous solubility data. This work found that the inclusion of predicted and experimental solubility in permeability models can improve model accuracy. However, the impact of solubility on oral absorption prediction was not as influential as expected. Finally, predictive models of permeability and solubility were built to predict a provisional Biopharmaceutic Classification System (BCS) class using two multi-label classification techniques, binary relevance and classifier chain. The classifier chain method was shown to have higher predictive accuracy by using predicted solubility as a molecular descriptor for permeability models, and hence better final provisional BCS prediction. Overall, this research has resulted in predictive and interpretable models that could be useful in a drug discovery context

    Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

    Get PDF
    The task of learning an expressive molecular representation is central to developing quantitative structure–activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed

    Do you know your r2?

    Get PDF
    The prediction of solubility of drugs usually calls on the use of several open-source/commercially-available computer programs in the various calculation steps. Popular statistics to indicate the strength of the prediction model include the coefficient of determination (r2), Pearson’s linear correlation coefficient (rPearson), and the root-mean-square error (RMSE), among many others. When a program calculates these statistics, slightly different definitions may be used. This commentary briefly reviews the definitions of three types of r2 and RMSE statistics (model validation, bias compensation, and Pearson) and how systematic errors due to shortcomings in solubility prediction models can be differently indicated by the choice of statistical indices. The indices we have employed in recently published papers on the prediction of solubility of druglike molecules were unclear, especially in cases of drugs from ‘beyond the Rule of 5’ chemical space, as simple prediction models showed distinctive ‘bias-tilt’ systematic type scatter

    Estimation of drug solubility in water, PEG 400 and their binary mixtures using the molecular structures of solutes

    Get PDF
    With the aim of solubility estimation in water, polyethylene glycol 400 (PEG) and their binary mixtures, quantitative structure-property relationships (QSPRs) were investigated to relate the solubility of a large number of compounds to the descriptors of the molecular structures. The relationships were quantified using linear regression analysis (with descriptors selected by stepwise regression) and formal inference-based recursive modeling (FIRM). The models were compared in terms of the solubility prediction accuracy for the validation set. The resulting regression and FIRM models employed a diverse set of molecular descriptors explaining crystal lattice energy, molecular size, and solute-solvent interactions. Significance of molecular shape in compound's solubility was evident from several shape descriptors being selected by FIRM and stepwise regression analysis. Some of these influential structural features, e.g. connectivity indexes and Balaban topological index, were found to be related to the crystal lattice energy. The results showed that regression models outperformed most FIRM models and produced higher prediction accuracy. However, the most accurate estimation was achieved by the use of a combination of FIRM and regression models. The results also showed that the use of melting point in regression models improves the estimation accuracy especially for solubility in higher concentrations of PEG. Aqueous or PEG/water solubilities can be estimated by these models with root mean square error of below 0.70. © 2010 Elsevier B.V

    QSPR Studies on Aqueous Solubilities of Drug-Like Compounds

    Get PDF
    A rapidly growing area of modern pharmaceutical research is the prediction of aqueous solubility of drug-sized compounds from their molecular structures. There exist many different reasons for considering this physico-chemical property as a key parameter: the design of novel entities with adequate aqueous solubility brings many advantages to preclinical and clinical research and development, allowing improvement of the Absorption, Distribution, Metabolization, and Elimination/Toxicity profile and “screenability” of drug candidates in High Throughput Screening techniques. This work compiles recent QSPR linear models established by our research group devoted to the quantification of aqueous solubilities and their comparison to previous research on the topic

    Octanol–water partition coefficients and aqueous solubility data of monoterpenoids: experimental, modeling, and environmental distribution

    Get PDF
    Terpenes and terpenoids encompass one of the most extensive and valuable classes of secondary metabolites. Their ten-carbon-containing oxygenated representatives, monoterpenoids, are the main components of plant essential oils, being widely exploited in the cosmetic, pharmaceutical, and food industrial areas. Due to its widespread use, it is crucial to investigate their environmental distribution. Thus, new water solubility data were obtained for six monoterpenoids ((1R)-(+)-camphor, (S)-(+)-carvone, eucalyptol, (1R)- (−)-fenchone, L-(−)-menthol, and (−)-menthone) at 298.2 and 313.2 K. Furthermore, octanol−water partition coefficients of 12 monoterpenoids (the six mentioned above plus carvacrol, (±)-ÎČ-citronellol, eugenol, geraniol, linalool, and thymol) were measured at 298.2 K. The COSMO-RS thermodynamic model and other more empirical approaches were evaluated for the description of the solubilities and partition coefficients, showing reliable predictions. Lastly, the distribution of the monoterpenoids in the different environmental compartments was assessed through an intuitive two-dimensional chemical space diagram based on the physicochemical equilibrium information reported.This work was developed within the scope of the project CIMO-Mountain Research Center, UIDB/00690/2020, and CICECO-Aveiro Institute of Materials, UIDB/50011/2020 and UIDP/50011/2020, financed by national funds through the Portuguese Foundation for Science and Technology (FCT)/MCTES. S.M.V.-B. thanks FCT and the European Social Fund (ESF) for his Ph.D. grant (SFRH/BD/138149/2018). M.C.d.C. would also like to thank CNPq (306666/ 2020-0) and FAPESP (2014/21252-0).info:eu-repo/semantics/publishedVersio

    In Silico Prediction of Physicochemical Properties

    Get PDF
    This report provides a critical review of computational models, and in particular(quantitative) structure-property relationship (QSPR) models, that are available for the prediction of physicochemical properties. The emphasis of the review is on the usefulness of the models for the regulatory assessment of chemicals, particularly for the purposes of the new European legislation for the Registration, Evaluation, Authorisation and Restriction of CHemicals (REACH), which entered into force in the European Union (EU) on 1 June 2007. It is estimated that some 30,000 chemicals will need to be further assessed under REACH. Clearly, the cost of determining the toxicological and ecotoxicological effects, the distribution and fate of 30,000 chemicals would be enormous. However, the legislation makes it clear that testing need not be carried out if adequate data can be obtained through information exchange between manufacturers, from in vitro testing, and from in silico predictions. The effects of a chemical on a living organism or on its distribution in the environment is controlled by the physicochemical properties of the chemical. Important physicochemical properties in this respect are, for example, partition coefficient, aqueous solubility, vapour pressure and dissociation constant. Whilst all of these properties can be measured, it is much quicker and cheaper, and in many cases just as accurate, to calculate them by using dedicated software packages or by using (QSPRs). These in silico approaches are critically reviewed in this report.JRC.I.3-Toxicology and chemical substance
    • 

    corecore