59 research outputs found

    Mind the Gap - Deciphering GPCR Pharmacology Using 3D Pharmacophores and Artificial Intelligence

    Get PDF
    G protein-coupled receptors (GPCRs) are amongst the most pharmaceutically relevant and well-studied protein targets, yet unanswered questions in the field leave significant gaps in our understanding of their nuanced structure and function. Three-dimensional pharmacophore models are powerful computational tools in in silico drug discovery, presenting myriad opportunities for the integration of GPCR structural biology and cheminformatics. This review highlights success stories in the application of 3D pharmacophore modeling to de novo drug design, the discovery of biased and allosteric ligands, scaffold hopping, QSAR analysis, hit-to-lead optimization, GPCR de-orphanization, mechanistic understanding of GPCR pharmacology and the elucidation of ligand–receptor interactions. Furthermore, advances in the incorporation of dynamics and machine learning are highlighted. The review will analyze challenges in the field of GPCR drug discovery, detailing how 3D pharmacophore modeling can be used to address them. Finally, we will present opportunities afforded by 3D pharmacophore modeling in the advancement of our understanding and targeting of GPCRs

    Current Mathematical Methods Used in QSAR/QSPR Studies

    Get PDF
    This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR) studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP), Project Pursuit Regression (PPR) and Local Lazy Regression (LLR) have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR), Partial Least Squares (PLS), Neural Networks (NN), Support Vector Machine (SVM) and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future

    Modelos multi-escala de inteligencia artificial para diseño quimio-informático y fármaco-epidemiológico de terapias anti-VIH en Condados de Estados Unidos

    Get PDF
    [Resumen]Los métodos que relacionan la estructura química con la actividad biológica se conocen como “relaciones cuantitativas estructura-actividad” (en adelante QSAR). Es fundamental entender y cuantificar la relación entre la estructura y la actividad biológica de los potenciales fármacos para realizar su estudio eficiente. Este tipo de estudio consiste en correlacionar, por medio de descriptores moleculares, distintas propiedades químicas o fisicoquímicas de las moléculas en cuestión con valores de actividad biológica. Actualmente, el desarrollo de medicamentos más seguros y efectivos en el tratamiento de enfermedades como el SIDA es un objetivo que requiere del esfuerzo de un elevado número de especialistas en diferentes campos de la Ciencia, y donde el azar ha tenido un gran protagonismo. Sin embargo, parece razonable pensar que nunca se obtendrán medicamentos eficaces y seguros con sólo acudir al azar. Para ser más eficientes en el desarrollo de nuevos fármacos, la investigación en el tratamiento de las enfermedades requiere poseer mecanismos predictivos de algunas actividades. Los modelos basados en “redes de neuronas artificiales” (en adelante RRNNAA) son un ejemplo de modelos teóricos de predicción, ampliamente utilizados en muchas áreas de la Ciencia, como medicina, química, bioquímica…, así como también en el desarrollo de medicamentos. En esto último, son muy útiles para la predicción de propiedades de los potenciales fármacos. Las RRNNAA se aproximan a la forma de operar que usa el cerebro humano, con habilidad para abordar con éxito los datos, las informaciones y los conocimientos naturales, o del mundo real, que están afectados por lo que se conoce como la “maldición de la cuádruple I”, por ser datos: inciertos, inconsistentes, incompletos e imprecisos. Esta particularidad hace que sean difíciles de gestionar adecuadamente por las técnicas computacionales convencionales, haciendo precisa la utilización de técnicas de Inteligencia Artificial, como son las ya citadas RRNNAA. La mayor ventaja de estos modelos inteligentes de predicción es que permiten evitar costes innecesarios producidos por desarrollos de nuevos compuestos con potencialidad terapéutica que resultarán estériles.Por lo tanto, el objetivo principal de la tesis aquí presentada es el desarrollo, con técnicas de inteligencia artificial, de una metodología “quimioinformática multi-escala” que permita relacionar cuantitativamente datos químicos y pre-clínicos con datos epidemiológicos, para llevar a cabo predicciones “fármaco-epidemiológicas”, teniendo en cuenta la imposibilidad práctica y legal de obtener datos experimentales, en la fase IV del proceso de desarrollo de nuevos compuestos[Resumo]Os métodos que relacionan a estrutura química coa actividade biolóxica son chamados “relacións cuantitativas estrutura – actividade” (en adiante QSAR). É esencial para entender e cuantificar a relación entre a estrutura e a actividade biolóxica dos potenciais fármacos para realizar o seu estudio eficiente. Este tipo de estudo consiste en correlacionar, a través de descritores moleculares, distintas propiedades químicas ou fisicoquímicas de las moleculas en cuestión, con valores de actividade biolóxica. Actualmente, o desenvolvemento de medicamentos máis seguros e efectivos no tratamento de enfermidades como o SIDA é un obxectivo que require do esforzo de un gran número de especialistas en diferentes campos da ciencia, e onde o azar tivo un gran protagonismo. Nembergantes, parece razoable pensar que nunca se obterían medicamentos eficaces e seguros con só acudir ao azar. Para ser máis eficaces no desenvolvemento de novos farmacos, a investigación para o tratamento de enfermidades require mecanismos preditivos de algunhas actividades. Os modelos baseados en redes neurais artificiais (en adiante RRNNAA) son un exemplo de modelos teóricos de predición amplamente utilizado en moitas áreas da ciencia, como medicina, química, bioquímica..., así como tamén no desenvolvemento de medicamentos. Nesto último, son moi útiles para a predición de propiedades dos potenciais medicamentos. As RRNNAA achegánse ao xeito de funcionar do cerebro humano, coa capacidade para abordar con éxito los datos, las informaciones y los conocimientos naturales, o del mundo real, que están afectados polo que se coñece como a “maldición da cuadrúple I”, por ser dados: incertos, inconsistentes, incompletos e imprecisos. Esta particularidade fai que sexan díficiles de xestionar axeitadamente coas técnicas computacionais convencionais, facendo preciso o uso de técnicas de Intelixencia Artificial, como son as xa citadas RRNNAA. A maior vantaxe destes modelos preditivos intelixentes é que permiten evitar custos innecesarios producidos polos desenvolvementos de novos compostos con potencial terapéutico que resultaran esteriles. Polo tanto o obxectivo principal da tese aquí presentada é o desenvolvemento, con tecnicas de intelixencia artificial dunha metodoloxía “quimioinformática multi-escala” que permita relacionar cuantitativamente datos químicos e pre-clínicos con datos epidemiolóxicos, para levar a cabo predicións fármaco-epidemiolóxicas, tendo en conta a imposibilidade práctica e legal de obter datos experimentais na fase IV do proceso de desenvolvemento de novos compostos.[Abstract]The methods relating chemical structure to biological activity are called “Quantitative Structure Activity Relationships” (QSAR). It is essential to understand and quantify the relationships between the structure and biological activity of potential drugs to develop an efficient study on them. This kind of study consists of the correlation of the molecular descriptors based on several chemical or physicochemical properties with biological activity. Currently, the development of safer and more effective drugs in the treatment of diseases such as AIDS is a goal that requires a joint effort of a large number of specialists from different fields of science, and where chance also has a major role. However, it seems reasonable that no effective and safe drugs will be obtained based on chance only. To be more efficient in developing new drugs, the research for the treatment of diseases requires predictive mechanisms of some biological activities. The models based on "Artificial Neural Networks" (ANNs) are an example of theoretical prediction models, widely used in many areas of science such as Medicine, Chemistry, Biochemistry, etc. as well as in Drug Development. In the latter, they are very useful for predicting properties of potential drugs. ANNs approach the modus operandi used by the human brain, being able to successfully manage data, information and natural knowledge, or from the real world, which are affected by the so-called "curse of the fourfold I", dealing with information which is uncertain, inconsistent, incomplete and inaccurate. This feature makes it difficult to properly manage by conventional computational techniques, making the use of Artificial Intelligence (AI) techniques necessary, such as the above-mentioned ANNs. The most important advantage of these intelligent prediction models is the fact that they avoid unnecessary production costs associated with the development of new compounds with therapeutic potential which proved to be inactive. Therefore, the main objective of the thesis is the development of a chemoinformatics multi-scale methodology using artificial intelligence techniques to quantitatively relate chemical and pre-clinical data with epidemiological data, with the aim of performing "drug - epidemiological" predictions, taking into account the practical and legal impossibility of obtaining experimental data in Phase IV of the development process of new compounds

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

    Computational prediction of metabolism: sites, products, SAR, P450 enzyme dynamics, and mechanisms.

    Get PDF
    Metabolism of xenobiotics remains a central challenge for the discovery and development of drugs, cosmetics, nutritional supplements, and agrochemicals. Metabolic transformations are frequently related to the incidence of toxic effects that may result from the emergence of reactive species, the systemic accumulation of metabolites, or by induction of metabolic pathways. Experimental investigation of the metabolism of small organic molecules is particularly resource demanding; hence, computational methods are of considerable interest to complement experimental approaches. This review provides a broad overview of structure- and ligand-based computational methods for the prediction of xenobiotic metabolism. Current computational approaches to address xenobiotic metabolism are discussed from three major perspectives: (i) prediction of sites of metabolism (SOMs), (ii) elucidation of potential metabolites and their chemical structures, and (iii) prediction of direct and indirect effects of xenobiotics on metabolizing enzymes, where the focus is on the cytochrome P450 (CYP) superfamily of enzymes, the cardinal xenobiotics metabolizing enzymes. For each of these domains, a variety of approaches and their applications are systematically reviewed, including expert systems, data mining approaches, quantitative structure-activity relationships (QSARs), and machine learning-based methods, pharmacophore-based algorithms, shape-focused techniques, molecular interaction fields (MIFs), reactivity-focused techniques, protein-ligand docking, molecular dynamics (MD) simulations, and combinations of methods. Predictive metabolism is a developing area, and there is still enormous potential for improvement. However, it is clear that the combination of rapidly increasing amounts of available ligand- and structure-related experimental data (in particular, quantitative data) with novel and diverse simulation and modeling approaches is accelerating the development of effective tools for prediction of in vivo metabolism, which is reflected by the diverse and comprehensive data sources and methods for metabolism prediction reviewed here. This review attempts to survey the range and scope of computational methods applied to metabolism prediction and also to compare and contrast their applicability and performance.JK, MJW, JT, PJB, AB and RCG thank Unilever for funding

    Desenvolvimento de modelos de machine learning baseados em QSAR-3D para predição de novos candidatos a fármacos inibidores da proteina CCR-5, para o tratamento de HIV/AIDS

    Get PDF
    Orientador: Prof. Anderson AraMonografia (especialização) - Universidade Federal do Paraná, Setor de Ciências Exatas, Curso de Especialização em Data Science e Big DataInclui referênciasResumo: Introdução. C-C receptor quimiocina tipo 5 (CCR-5), é uma proteína encontrada na superfície das células de defesa (linfócitos e macrófagos). A CCR-5 é a estrutura à qual o vírus HIV (vírus da imunodeficiência humana) se liga para invadir a célula hospedeira causando o desenvolvimento da AIDS (síndrome da imunodeficiência adquirida). Neste estudo, foram desenvolvidos modelos de machine learning (ML) baseados em relação estrutura atividade quantitativa (QSAR) para predizer compostos com bioatividade inibitória contra a proteína CCR-5 para o tratamento de HIV.Material e métodos. Umconjunto de dados experimentais não reduntantes de 2929 compostos com valores de bioatividade inibitória (expressa em IC50) contra a proteína CCR-5 foram colectados na base de dados CHEMBL e empregados para desenvolver modelos de ML baseados em QSAR, visando predizer a sua bioatividade. Esses 2929 compostos foram descritos usando Pubchem fungreprint e 32 diferentes algorítmos de ML foramtreinados e testados. A avaliação do desempenho dos modelos foi feita utilizando as métricas R2,MSE, RMSE, MAE e tempo de treinamento. Cada umdos cinco melhores modelos de ML foi aplicado o método SHAP value visando identificar as features (descritores) mais importantes na predição da bioatividade dos compostos contra HIV. Resultados. Os cinco melhores modelos de ML que tiveram melhor desempenho na predição da bioatividade inibitória contra a proteína CCR-5 para o tratamento de HIV foram: Random Forest (RF), Histogram Gradient Boosting (HGBM), LGBM, Bagging e KNN, cujos valores de capaciadde preditiva (R2) variaram entre 82-87%. Conclusão. Neste estudo, foramdesenvolvido cinco modelos de ML (RF, HGBM, LGBM, Bagging e KNN) para predizer a bioatividade inibitória dos compostos contra a proteína CCR-5 para a descoberta de novos fármacos contra HIV. Esses modelos deML podem ser usados como um filtro de seleção de novas moléculas, que podem ser testadas nos experimentos in vitro e in vivo que visam a descoberta de novos fármacos inibidores da proteína CCR-5 para o tratamento potencial de HIVAbstract: Introduction. C-C chemokine receptor type 5 (CCR-5) is a protein found on the surface of defense cells (lymphocytes and macrophages). CCR-5 is the structure to which the HIV virus (human immunodeficiency virus) binds to invade the host cell causing the development of AIDS (acquired immunodeficiency syndrome). In this study, machine learning (ML) models based on quantitative structure activity relationship (QSAR) were developed to predict compounds with inhibitory bioactivity against the CCR-5 protein for the treatment of HIV. Material e métodos. A non-redundant experimental dataset of 2929 compounds with inhibitory bioactivity values (expressed in IC50) against the CCR-5 protein were collected from the CHEMBL database and used to develop QSAR-based ML models to predict their bioactivity. These 2929 compounds were described using PubChem fingerprint and 32 different ML algorithms were trained and tested. The evaluation of the performance of theML models was made using the metrics R2, MSE, RMSE, MAE and training time. Each of the five best ML models was applied the SHAP values method to identify the most important features (descriptors) in predicting the bioactivity of compounds against HIV. Results: The five best ML models that had the best performance in predicting the inhibitory bioactivity against the CCR-5 protein for the treatment of HIV were: Random Forest (RF), Histogram based Gradient Boosting (HGBM), LGBM, Bagging and KNN, whose predictive capacity values (R2) ranged between 82-87Results. The five best ML models that had the best performance in predicting the inhibitory bioactivity against the CCR-5 protein for the treatment of HIV were: Random Forest (RF), Histogram based Gradient Boosting (HGBM), LGBM, Bagging and KNN, whose predictive capacity values (R2) ranged between 82-87%. Conclusion. In this study, five ML models (RF, HGBM, LGBM, Bagging and KNN) were developed to predict the inhibitory bioactivity of compounds against the CCR-5 protein for the discovery of new drugs against HIV. These ML models can be used as a selection filter for new molecules, which can be tested in in vitro and in vivo experiments aimed at discovering new CCR-5 protein inhibitor drugs for the potential treatment of HIV

    Exploring the potential of Spherical Harmonics and PCVM for compounds activity prediction

    Get PDF
    Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs

    Keras/TensorFlow in Drug Design for Immunity Disorders

    Get PDF
    Homeostasis of the host immune system is regulated by white blood cells with a variety of cell surface receptors for cytokines. Chemotactic cytokines (chemokines) activate their receptors to evoke the chemotaxis of immune cells in homeostatic migrations or inflammatory conditions towards inflamed tissue or pathogens. Dysregulation of the immune system leading to disorders such as allergies, autoimmune diseases, or cancer requires efficient, fast-acting drugs to minimize the long-term effects of chronic inflammation. Here, we performed structure-based virtual screening (SBVS) assisted by the Keras/TensorFlow neural network (NN) to find novel compound scaffolds acting on three chemokine receptors: CCR2, CCR3, and one CXC receptor, CXCR3. Keras/TensorFlow NN was used here not as a typically used binary classifier but as an efficient multi-class classifier that can discard not only inactive compounds but also low- or medium-activity compounds. Several compounds proposed by SBVS and NN were tested in 100 ns all-atom molecular dynamics simulations to confirm their binding affinity. To improve the basic binding affinity of the compounds, new chemical modifications were proposed. The modified compounds were compared with known antagonists of these three chemokine receptors. Known CXCR3 compounds were among the top predicted compounds; thus, the benefits of using Keras/TensorFlow in drug discovery have been shown in addition to structure-based approaches. Furthermore, we showed that Keras/TensorFlow NN can accurately predict the receptor subtype selectivity of compounds, for which SBVS often fails. We cross-tested chemokine receptor datasets retrieved from ChEMBL and curated datasets for cannabinoid receptors. The NN model trained on the cannabinoid receptor datasets retrieved from ChEMBL was the most accurate in the receptor subtype selectivity prediction. Among NN models trained on the chemokine receptor datasets, the CXCR3 model showed the highest accuracy in differentiating the receptor subtype for a given compound dataset

    Computational Approaches for the Characterization of the Structure and Dynamics of G Protein-Coupled Receptors: Applications to Drug Design

    Get PDF
    G Protein-Coupled Receptors (GPCRs) constitute the most pharmacologically relevant superfamily of proteins. In this thesis, a computational pipeline for modelling the structure and dynamics of GPCRs is presented, properly combined with experimental collaborations for GPCR drug design. These include the discovery of novel scaffolds as potential antipsychotics, and the design of a new series of A3 adenosine receptor antagonists, employing successful combinations of structure- and ligand-based approaches. Additionally, the structure of Adenosine Receptors (ARs) was computationally assessed, with implications in ligand affinity and selectivity. The employed protocol for Molecular Dynamics simulations has allowed the characterization of structural determinants of the activation of ARs, and the evaluation of the stability of GPCR dimers of CXCR4 receptor. Finally, the computational pipeline here developed has been integrated into the web server GPCR-ModSim (http://gpcr.usc.es), contributing to its application in biochemical and pharmacological studies on GPCRs
    corecore