Search CORE

4 research outputs found

The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Author: Antony J. Williams
Daniel M. Lowe
Igor V. Tetko
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826

Springer - Publisher Connector

PubMed Central

PuSH

FigShare

A Computational perspective on the concerted cleavage mechanism of the natural targets of HIV-1 protease.

Author: Lawal Monsurat Motunrayo.
Publication venue
Publication date: 01/01/2018
Field of study

Doctoral Degree. University of KwaZulu-Natal, Durban.One infectious disease that has had both a profound health and cultural impact on the human race in recent decades is the Acquired Immune Deficiency Syndrome (AIDS) caused by the Human Immunodeficiency Virus (HIV). A major breakthrough in the treatment of HIV-1 was the use of drugs inhibiting specific enzymes necessary for the replication of the virus. Among these enzymes is HIV-1 protease (PR), which is an important degrading enzyme necessary for the proteolytic cleavage of the Gag and Gag-Pol polyproteins, required for the development of mature virion proteins. The mechanism of action of the HIV-1 PR on the proteolysis of these polyproteins has been a subject of research over the past three decades. Most investigations on this subject have been dedicated to exploring the reaction mechanism of HIV-1 PR on its targets as a stepwise general acid-base process with little attention on a concerted model. One of the shortcomings of the stepwise reaction pathway is the existence of more than two TS moieties, which have led to varying opinions on the exact rate-determining step of the reaction and the protonation pattern of the catalytic aspartate group at the HIV-1 PR active site. Also, there is no consensus on the actual recognition mechanism of the natural substrates by the HIV-1 PR. By means of concerted transition state (TS) structural models, the recognition mode and the reaction mechanism of HIV-1 PR with its natural targets were investigated in this present study. The investigation was designed to elucidate the cleavage of natural substrates by HIV-1 PR using the concerted TS model through the application of computational methods to unravel the recognition and reaction process, compute activation parameters and elucidate quantum chemical properties of the system. Quantum mechanics (QM) methods including the density functional theory (DFT) models and Hartree-Fock (HF), molecular mechanics (MM) and hybrid QM/MM were employed to provide better insight in this topic. Based on experience with concerted TS modelling, the six-membered ring TS structure was proposed. Using a small model system and QM methods (DFT and HF), the enzymatic mechanism of HIV-1 PR was studied as a general acid-base model having both catalytic aspartate group participating and water molecule attacking the natural substrate synchronously. The natural substrate scissile bond strength was also investigated via changes of electronic effects. The proposed concerted six-membered ring TS mechanism of the natural substrate within the entire enzyme was studied using hybrid QM/MM; “Our own N-layered Integrated molecular Orbital and molecular Mechanics” (ONIOM) method. This investigation led us to a new perspective in which an acyclic concerted pathway provided a better approach to the subject than the proposed six-membered model. The natural substrate recognition pattern was therefore investigated using the concerted acyclic TS modelling to examine if HIV-1 (South Africa subtype C, C-SA and subtype B) PRs recognize their substrates in the same manner using ONIOM approach. A major outcome in the present investigation is the computational modelling of a new, potentially active, substrate-based inhibitor through the six-membered concerted cyclic TS modelling and a small system. By modelling the entire enzyme—substrate system using a hybrid QM/MM (ONIOM) method, three different pathways were obtained. (1) A concerted acyclic TS structure, (2) a concerted six-membered cyclic TS model and (3) another sixmembered ring TS model involving two water molecules. The activation free energies obtained for the first and the last pathways were in agreement with in vitro HIV-1 PR hydrolysis data. The mechanism that provides marginally the lowest activation barrier involves an acyclic TS model with one water molecule at the HIV-1 PR active site. The outcome of the study provides a plausible theoretical benchmark for the concerted enzymatic mechanism of HIV-1 PRs which could be applied to related homodimeric protease and perhaps other enzymatic processes. Applying the one-step concerted acyclic catalytic mechanism for two HIV-1 PR subtypes, the recognition phenomena of both enzyme and substrate were studied. It was observed that the studied HIV-1 PR subtypes (B and C-SA) recognize and cleave at both scissile and non-scissile regions of the natural substrate sequences and maintaining preferential specificity for the scissile bonds with characteristic lower activation free energies. Future studies on the reaction mechanism of HIV-1 PR and natural substrates should involve the application of advanced computational techniques to provide plausible answers to some unresolved perspectives. Theoretical investigations on the enzymatic mechanism of HIV-1 PR— natural substrate in years to come, would likely involve the application of sophisticated computational techniques aimed at exploring more than the energetics of the system. The possibility of integrated computational algorithms which do not involve partitioning/restraining/constraining/cropped model systems of the enzyme—substrate mechanism would likely surface in future to accurately elucidate the HIV-1 PR catalytic process on natural substrates/ligands

ResearchSpace@UKZN

Recommended from our members

Contributions to evaluation of machine learning models. Applicability domain of classification models

Author: Rado Omesaad A.M.
Publication venue: Faculty of Engineering and Informatics
Publication date: 01/01/2019
Field of study

Artificial intelligence (AI) and machine learning (ML) present some application opportunities and challenges that can be framed as learning problems. The performance of machine learning models depends on algorithms and the data. Moreover, learning algorithms create a model of reality through learning and testing with data processes, and their performance shows an agreement degree of their assumed model with reality. ML algorithms have been successfully used in numerous classification problems. With the developing popularity of using ML models for many purposes in different domains, the validation of such predictive models is currently required more formally. Traditionally, there are many studies related to model evaluation, robustness, reliability, and the quality of the data and the data-driven models. However, those studies do not consider the concept of the applicability domain (AD) yet. The issue is that the AD is not often well defined, or it is not defined at all in many fields. This work investigates the robustness of ML classification models from the applicability domain perspective. A standard definition of applicability domain regards the spaces in which the model provides results with specific reliability. The main aim of this study is to investigate the connection between the applicability domain approach and the classification model performance. We are examining the usefulness of assessing the AD for the classification model, i.e. reliability, reuse, robustness of classifiers. The work is implemented using three approaches, and these approaches are conducted in three various attempts: firstly, assessing the applicability domain for the classification model; secondly, investigating the robustness of the classification model based on the applicability domain approach; thirdly, selecting an optimal model using Pareto optimality. The experiments in this work are illustrated by considering different machine learning algorithms for binary and multi-class classifications for healthcare datasets from public benchmark data repositories. In the first approach, the decision trees algorithm (DT) is used for the classification of data in the classification stage. The feature selection method is applied to choose features for classification. The obtained classifiers are used in the third approach for selection of models using Pareto optimality. The second approach is implemented using three steps; namely, building classification model; generating synthetic data; and evaluating the obtained results. The results obtained from the study provide an understanding of how the proposed approach can help to define the model’s robustness and the applicability domain, for providing reliable outputs. These approaches open opportunities for classification data and model management. The proposed algorithms are implemented through a set of experiments on classification accuracy of instances, which fall in the domain of the model. For the first approach, by considering all the features, the highest accuracy obtained is 0.98, with thresholds average of 0.34 for Breast cancer dataset. After applying recursive feature elimination (RFE) method, the accuracy is 0.96% with 0.27 thresholds average. For the robustness of the classification model based on the applicability domain approach, the minimum accuracy is 0.62% for Indian Liver Patient data at r=0.10, and the maximum accuracy is 0.99% for Thyroid dataset at r=0.10. For the selection of an optimal model using Pareto optimality, the optimally selected classifier gives the accuracy of 0.94% with 0.35 thresholds average. This research investigates critical aspects of the applicability domain as related to the robustness of classification ML algorithms. However, the performance of machine learning techniques depends on the degree of reliable predictions of the model. In the literature, the robustness of the ML model can be defined as the ability of the model to provide the testing error close to the training error. Moreover, the properties can describe the stability of the model performance when being tested on the new datasets. Concluding, this thesis introduced the concept of applicability domain for classifiers and tested the use of this concept with some case studies on health-related public benchmark datasets.Ministry of Higher Education in Liby

Bradford Scholars

The perspectives of computational chemistry modeling.

Author: Tetko I.V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The on-line tools for computational chemistry modeling will be increasingly used in the future. This will bring the advantages both for the authors and the readers

PuSH