52 research outputs found

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂ­a de EconomĂ­a e Industria; 10SIN105004P

    Quantitative models for predicting antioxidant capacity in herbs based on molecular structures and compositions

    Get PDF
    Herbs are considered as a vital source of natural antioxidants that can neutralise free radicals which cause harmful health effects to the human body. Researchers have found that the phenolic compounds are the major phytochemicals in herbs that contribute to their antioxidant capacity. However, even though the herbs are grown in the same conditions and geographic origin, the components and composition of phenolic compounds may differ for each sample, contributing to different antioxidant capacities. Previous researchers have only studied the interactions between either their molecular structures or composition of phenolic compounds. The interaction and synergistic effect of the combined components and composition of phenolic compounds contributing to their antioxidant property are still unknown. The aim of this research is to understand the synergistic effect between the structure and composition of phenolic compounds in herbs by developing a quantitative model. Firstly, a Quantitative Structure-Activity Relationship (QSAR) model was developed in three different approaches, namely general, consensus and comprehensive models using literature data set of traditional Chinese medicine. Previous research have developed the QSAR models using all generated molecular descriptors without any classification that might overlooked the important variable. In this research, the general and consensus models were built using the molecular descriptors from the DRAGON software. The general model utilised all the molecular descriptors, while the consensus model classified the molecular descriptors according to the phenolic compound groups. In addition, quantum-chemical descriptors from the Gauss View 5.0 and Gaussian 09 software which were also added into the model to include 3D descriptors in the model, and therefore, the model is known as the comprehensive model. Then, a new Quantitative Structure-Composition-Activity Relationship (QSCAR) model was developed by using the experimental data set to further correlate between the molecular structure (from QSAR model) and composition ratio for each significant phenolic compound in Misai Kucing. Three variable selections, namely forward stepwise, interval-partial least square (i-PLS) and genetic algorithm and two multi-linear regression analysis methods were combined to developed all models. The best performance QSCAR model based on the robustness, reliability and predictivity was selected and the result was compared with QSAR model and experimental results. As a result, the consensus model produced overall performance better than the general model. The increment of antioxidant activity is correlated with the phenolic compound size through measurement of the bond indices distance between the atom, shape that is specifically calculated in the proportion of path/walk in length 3 from molecular Randic shape index and the number of bridge edges. The high ratio between EHOMO and ELUMO, the low of stability and total energy values of phenolic compounds increased the antioxidant activity as well. The QSCAR could predict the antioxidant capacity with 13.88 % more accurately than the QSAR model. The QSCAR model shows that the high compositions of apigenin and dalspinosin while the low composition of caffeic, ferulic and rosmarinic acids increased the antioxidant capacity in Misai Kucing. In conclusion, a quantitative model has been developed to predict the antioxidant capacity in herbs by combining the comprehensive QSAR and QSCAR models. The QSAR model is generic for phenolic compounds, but QSCAR needs to be simulated again with the other herb composition ratios. Thus, the future researchers can use the models to predict antioxidant capacity for other herbs. The research may also be beneficial by extending the model for predicting other biological activities

    (Q)SAR Modelling of Nanomaterial Toxicity - A Critical Review

    Get PDF
    There is an increasing recognition that nanomaterials pose a risk to human health, and that the novel engineered nanomaterials (ENMs) in the nanotechnology industry and their increasing industrial usage poses the most immediate problem for hazard assessment, as many of them remain untested. The large number of materials and their variants (different sizes and coatings for instance) that require testing and ethical pressure towards non-animal testing means that expensive animal bioassay is precluded, and the use of (quantitative) structure activity relationships ((Q)SAR) models as an alternative source of hazard information should be explored. (Q)SAR modelling can be applied to fill the critical knowledge gaps by making the best use of existing data, prioritize physicochemical parameters driving toxicity, and provide practical solutions to the risk assessment problems caused by the diversity of ENMs. This paper covers the core components required for successful application of (Q)SAR technologies to ENMs toxicity prediction, and summarizes the published nano-(Q)SAR studies and outlines the challenges ahead for nano-(Q)SAR modelling. It provides a critical review of (1) the present status of the availability of ENMs characterization/toxicity data, (2) the characterization of nanostructures that meets the need of (Q)SAR analysis, (3) the summary of published nano-(Q)SAR studies and their limitations, (4) the in silico tools for (Q)SAR screening of nanotoxicity and (5) the prospective directions for the development of nano-(Q)SAR models

    The conformation-independent QSPR approach for predicting the oxidation rate constant of water micropollutants

    Get PDF
    In advanced water treatment processes, the degradation efficiency of contaminants depends on the reactivity of the hydroxyl radical toward a target micropollutant. The present study predicts the hydroxyl radical rate constant in water (kOH) for 118 emerging micropollutants, by means of quantitative structure-property relationships (QSPR). The conformation-independent QSPR approach is employed, together with a large number of 15,251 molecular descriptors derived with the PaDEL, Epi Suite, and Mold2 freewares. The best multivariable linear regression (MLR) models are found with the replacement method variable subset selection technique. The proposed five-descriptor model has the following statistics for the training set: R2 train = 0:88, RMStrain = 0.21, while for the test set is R2 test = 0:87, RMStest = 0.11. This QSPR serves as a rational guide for predicting oxidation processes of micropollutants.Instituto de Investigaciones FisicoquĂ­micas TeĂłricas y AplicadasFacultad de Ciencias Agrarias y Forestale

    Navigating bioactivity space in anti-tubercular drug discovery through the deployment of advanced machine learning models and cheminformatics tools : a molecular modeling based retrospective study

    Get PDF
    Mycobacterium tuberculosis is the bacterial strain that causes tuberculosis (TB). However, multidrug-resistant and extensively drug-resistant tuberculosis are significant obstacles to effective treatment. As a result, novel therapies against various strains of M. tuberculosis have been developed. Drug development is a lengthy procedure that includes identifying target protein and isolation, preclinical testing of the drug, and various phases of a clinical trial, etc., can take decades for a molecule to reach the market. Computational approaches such as QSAR, molecular docking techniques, and pharmacophore modeling have aided drug development. In this review article, we have discussed the various techniques in tuberculosis drug discovery by briefly introducing them and their importance. Also, the different databases, methods, approaches, and software used in conducting QSAR, pharmacophore modeling, and molecular docking have been discussed. The other targets targeted by these techniques in tuberculosis drug discovery have also been discussed, with important molecules discovered using these computational approaches. This review article also presents the list of drugs in a clinical trial for tuberculosis found drugs. Finally, we concluded with the challenges and future perspectives of these techniques in drug discovery.Peer reviewe

    Development of classification and regression based QSAR models to predict rodent carcinogenic potency using oral slope factor

    Get PDF
    Carcinogenicity is among the toxicological endpoints posing the highest concern for human health. Oral slope factors (OSFs) are used to estimate quantitatively the carcinogenic potency or the risk associated with exposure to the chemical by oral route. Regulatory agencies in food and drug administration and environmental protection are employing quantitative structure-activity relationship (QSAR) models to fill the data gaps related with properties of chemicals affecting the environment and human health. In this background, we have developed quantitative structure-carcinogenicity regression models for rodents based on the carcinogenic potential of 70 chemicals with wide diversity of molecular structures, spanning a large number of chemical classes and biological mechanisms. All the developed models have been assessed according to the Organization for Economic Cooperation and Development (OECD) principles for the validation of QSAR models. We have also attempted to develop a carcinogenicity classification model based on Linear Discriminant Analysis (LDA). Developed regression and LDA models are rigorously validated internally as well as externally. Our in silico studies make it possible to obtain a quantitative interpretation of the structural information of carcinogenicity along with identification of the discriminant functions between lower and higher carcinogenic compounds by LDA. Pharmacological distribution diagrams (PDDs) are used as a visualizing technique for the identification and selection of chemicals with lower carcinogenicity. Constructive, informative and comparable interpretations have been observed in both cases of classification and regression based modeling.SK thanks the Department of Science and Technology, Government of India for awarding him a Senior Research fellowship under the INSPIRE scheme. KR thanks the Council of Scientific and Industrial Research (CSIR), New Delhi for awarding a major research project

    Development and Evaluation of ADME Models Using Proprietary and Opensource Data

    Get PDF
    Absorption, Distribution, Metabolism and Elimination (ADME) properties are important factors in the drug discovery pipeline. Literature ADME data are often collected in large chemical databases like ChEMBL, which might be an asset to improve the prediction of ADME properties. Pharmaceutical companies build ADME Quantitative Structure Property Relationships (QSPR) models using proprietary data and thus the inclusion of literature data might be a valuable source for the development of predictive models. The aim of this study was to investigate whether merging literature and proprietary data could improve the predictive activity of proprietary models and enlarge their applicability domain (AD). ADME predictive models for Caco-2 (A to B) permeability and LogD7.4 were built with data extracted from Evotec and ChEMBL database. Predictive models were developed for each property and three different training sets were used based on: proprietary compounds (Evotec models), literature compounds (ChEMBL models) and a merged set of proprietary and literature compounds (Evotec+ChEMBL models). The Random Forest (RF), Partial Least Squares (PLS) and Support Vector Regression (SVR) were used to develop the models. The performance of the models was evaluated by using two types of test sets: a diverse test set (20 % compounds of available data randomly selected) and a temporal test set (data published after the models were built). The descriptors that used were the physiochemical descriptors, the structural Molecular Access System (MACCS) descriptors and the Partial equalisation of orbital electronegativity – van der Walls surface areas (Peoe-VSA) descriptors. The AD of the models was evaluated with four distance to model metrics, which were the: kNN with Euclidean distance, kNN with Manhattan distance, Leverage and Mahalanobis distance. The ability of an existing Evotec Caco-2 permeability model to assess literature compounds (extracted from ChEMBL) was evaluated. The literature test set was predicted with a higher RMSE compared to the RMSE in prediction for internal compounds. Additionally, a number of literature compounds was found to be outside the AD of the Evotec model, thus highlighting an area of improvement for proprietary Evotec models. Furthermore, the effect of the inclusion of literature data in the existing Caco-2 permeability and LogD7.4 Evotec proprietary models was evaluated. The RF algorithm was the highest performing method for the development of Caco-2 permeability models and the SVR for the LogD7.4 models. In addition, the leverage method proved to be the most appropriate for the evaluation of the models’ AD. The permeability model built merging literature and proprietary data (Evotec+ChEMBL model) predicted a literature temporal test set with an RMSE of 0.68 while the Evotec model showed an RMSE of 0.74. Even in the case of the Evotec temporal test set, the two models performed similarly and the AD of the mixed models (incorporating both literature and proprietary data) was enlarged. The 86.15% of the compounds in the proprietary temporal test set were within the AD of the Evotec+ChEMBL model, while 76.50% of the compounds of the same test set appeared to be within the AD of the Evotec model. Similarly, the LogD7.4 Evotec+ChEMBL model predicted a literature temporal test set with an RMSE of 0.77 while the Evotec model showed an RMSE of 0.83. Even in the case of the Evotec temporal test set, the two models performed similarly but the AD of the mixed models (incorporating both literature and proprietary data) was enlarged. The 94.86% of the compounds in the proprietary temporal test set were within the AD of the Evotec+ChEMBL model, while 88.49% of the compounds of the same test set appeared to be within the AD of the Evotec model. This study demonstrated that the inclusion of public ADME data into proprietary models improved the performance of proprietary models and enlarged at the same time their AD. The methodology presented herein will be applied by Evotec computational scientists to re-build the Caco-2 and LogD7.4 Evotec proprietary models considering literature data as discussed in this thesis
    • …
    corecore