36 research outputs found

    How accurately can we predict the melting points of drug-like compounds?

    Get PDF
    © 2014 American Chemical Society. This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638

    QSAR approaches to predict human cytochrome P450 inhibition.

    No full text
    This thesis focuses on several aspects of QSAR modeling of human cytochrome P450 inhibition and suggests the methodology to increase the quality of CYP inhibition models. It is shown that the addition of newly developed descriptors derived from docking simulations increases the predictive ability of the resulting models. The studies were performed on the OCHEM platform (http://ochem.eu) and all the descriptors, datasets and models are publicly available to the scientific community

    From descriptors to predicted properties: Experimental design by using applicability domain estimation.

    No full text
    The importance of reliable methods for representative sub-sampling in terms of experimental design and risk assessment within the European Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) system is crucial. We developed experimental design approaches, by utilising predicted properties and the 'distance to model' parameter, to estimate the benefits of certain compounds to the quality of a resulting model. A statistical evaluation of four regression data sets and one classification data set showed that the adaptive concept of iteratively refining the representation of the chemical space contributes to a more efficient and more reliable selection in comparison to traditional approaches. The evaluation of compounds with regard to the uncertainty and the correlation of prediction is beneficial, and in particular, for regression data sets of sufficient size, whereas the use of predicted properties to define the chemical space is beneficial for classification models

    Modeling of non-additive mixture properties using the Online CHEmical database and Modeling environment (OCHEM).

    No full text
    The Online Chemical Modeling Environment (OCHEM, http://ochem.eu) is a web-based platform that provides tools for automation of typical steps necessary to create a predictive QSAR/QSPR model. The platform consists of two major subsystems: a database of experimental measurements and a modeling framework. So far, OCHEM has been limited to the processing of individual compounds. In this work, we extended OCHEM with a new ability to store and model properties of binary non-additive mixtures. The developed system is publicly accessible, meaning that any user on the Web can store new data for binary mixtures and develop models to predict their non-additive properties.The database already contains almost 10,000 data points for the density, bubble point, and azeotropic behavior of binary mixtures. For these data, we developed models for both qualitative (azeotrope/zeotrope) and quantitative endpoints (density and bubble points) using different learning methods and specially developed descriptors for mixtures. The prediction performance of the models was similar to or more accurate than results reported in previous studies. Thus, we have developed and made publicly available a powerful system for modeling mixtures of chemical compounds on the Web

    Online chemical modeling environment

    No full text

    In silico p<em>K<sub>a</sub></em> prediction.

    Get PDF
    The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pKa of the acid dissociation constant (Ka). The acid dissociation constant (also protonation or ionization constant) Ka is an equilibrium constant defined as the ratio of the protonated and the deprotonated form of a compound. The pKa value of a compound strongly influences its pharmacokinetic and biochemical properties. Its accurate estimation is therefore of great interest in areas such as biochemistry, medicinal chemistry, pharmaceutical chemistry, and drug development. Aside from the pharmaceutical industry, it also has relevance in environmental ecotoxicology, as well as the agrochemicals and specialty chemicals industries. In literature, a vast number of different approaches for pKa prediction can be found. These approaches can be divided into two different classes. On the one hand there are direct calculations, so called ab initio methods, trying to determine the pKa value by quantum chemical or mechanical computation. On the other hand, statistical models, trained on chemical or structural descriptors. These descriptors can be, for example, of quantum chemical, semi empirical, graph topological or simple statistical nature. This type of modeling is called QSPR (Quantitative Structure Property Relationship). In our recent work, we develop such a QSPR model using localized molecular descriptors to train multiple linear regression and artificial neural networks to estimate dissociation constants (pKa). The performance of our approach is similar to that of a semi-empirical model based on frontier electron theory as well as a prediction model based on Graph Kernels How such a prediction model can be built, is shown by an example performed with OCHEM, an online chemical database with an environment for modeling (http://ochem.eu/ webcite). It is a publicly accessible database for chemical compound data and predictive models. Further, users get the facility to develop, apply, and distribute predictive models, so it is unique in its combination of compound data and predictive models. &nbsp

    Chemogenomic approach to increase accuracy of QSAR modeling of inhibition activity against five major P450 isoforms.

    Get PDF
    Cytochromes P450 (CYP) are a superfamily of enzymes, involved in metabolism of xenobiotic compounds. CYP are involved in metabolism of a large amount of drugs, currently present on the market. Therefore, prediction of CYP inhibition activity of small molecules poses an important task, especially in early stage drug discovery, due to high risk of drug-drug interactions. It is estimated that CYP enzymes metabolize over 75% of currently marketed drugs. Of these reactions over 90% are facilitated by CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4. This makes these enzymes particularly interesting targets for in-silico inhibition prediction. Accurate prediction of inhibition activity of small molecules against CYP enzymes is particularly important in the field of personalized medicine discovery. High promiscuity with respect to substrates of the studied cytochromes limits the approach of traditional QSAR methods. Including structural information of the protein is crucial to obtaining predictive models. In this work the modeling is performed on a set of chemogenomic descriptors obtained from protein-ligand complexes. The quality of the descriptors is benchmarked in QSAR modeling of HTS data for human CYP450 inhibition. The calculation of descriptors involves a flexible docking of the molecule to the rigid binding cite of the cytochrome (in this study the AutoDock Vina tool was used). The obtained top-ranked conformation is then processed to obtain the descriptors. The training sets for the benchmarked models were obtained from PubChem BioAssay database (assays AID410, AID883, AID899, AID884 and AID891 for CYP1A2, 2C9, 2C19, 3A4 and 2D6, respectively). The test sets are obtained from the AID1851 assay by excluding all molecules present in the training set. The models presented in the study achieved 82 - 87% of correctly classified compounds on the validated training set and 65 - 75% of correctly classified instances on the test sets. The dramatic difference in model performance between the test and the validated training sets can be explained by structural dissimilarity of the sets. The use of applicability domain approaches to select only confident predictions allowed to achieve the accuracy of 90% of correctly classified instances on the subset of 20% most confident predictions of the test set. The datasets and the benchmarked models are available on the Online Chemical Modeling Environment (http://ochem.eu). &nbsp

    A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition.

    No full text
    Prediction of CYP450 inhibition activity of small molecules poses an important task due to high risk of drug-drug interactions. CYP1A2 is an important member of CYP450 superfamily and accounts for 15% of total CYP450 presence in human liver. This article compares 80 in-silico QSAR models that were created by following the same procedure with different combinations of descriptors and machine learning methods. The training and test sets consist of 3745 and 3741 inhibitors and noninhibitors from PubChem BioAssay database. A heterogeneous external test set of 160 inhibitors was collected from literature. The studied descriptor sets involve E-state, Dragon and ISIDA SMF descriptors. Machine learning methods involve Associative Neural Networks (ASNN), K Nearest Neighbors (kNN), Random Tree (RT), C4.5 Tree (J48), and Support Vector Machines (SVM). The influence of descriptor selection on model accuracy was studied. The benefits of &quot;bagging&quot; modeling approach were shown. Applicability domain approach was successfully applied in this study and ways of increasing model accuracy through use of applicability domain measures were demonstrated as well as fragment-based model interpretation was performed. The most accurate models in this study achieved values of 83% and 68% correctly classified instances on the internal and external test sets, respectively. The applicability domain approach allowed increasing the prediction accuracy to 90% for 78% of the internal and 17% of the external test sets, respectively. The most accurate models are available online at http://ochem.eu/models/Q5747
    corecore