10 research outputs found

    Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities

    Get PDF
    Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented

    CATMoS: Collaborative Acute Toxicity Modeling Suite.

    Get PDF
    BACKGROUND: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals. OBJECTIVES: The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 (LD50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [LD50 (LD50≤50mg/kg)], and nontoxic chemicals (LD50>2,000mg/kg). METHODS: An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches. RESULTS: The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results. DISCUSSION: CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495

    Comprehensive data analysis and predictive chemoinformatics models for REACH related physicochemical and (eco)toxicity properties

    Full text link
    Cette thèse concerne la modélisation de propriétés environnementales et (éco)-toxicologiques pertinentes dans le cadre du règlement de l'Union Européenne sur l'enregistrement, l'évaluation, l'autorisation et la restriction des substances chimiques (REACH, CE n ° 1907/2006). Des modèles statistiques ont été générés à l'aide de méthodes d'apprentissage automatique, telles que les Séparateurs à Vaste Marge (SVM) ou les Forêts Aléatoires (Random Forest), et des descripteurs moléculaires. Les modèles sont conçus pour être utilisés comme une alternative crédible aux tests expérimentaux et pour compléter les données manquantes dans le cadre du règlement REACH. Les nouveaux modèles présentent plusieurs avantages par rapport aux modèles existants: (i) ils sont construits sur des ensembles de données sensiblement plus grands; (ii) ils sont validés sur des données externes de tailles significatives composés d’exemples issus d’un contexte industriel (l’entreprise Solvay); (iii) la précision des modèles est améliorée et leurs domaines d'applicabilité sont étendus.This thesis concerns the modelling of several environmental fate and (eco)toxicological properties relevant under the European Union Registration, Evaluation, Authorisation and Restriction of Chemical Substances Regulation (REACH, EC No 1907/2006). Statistical models have been generated using state-of-the-art machine learning methods, such Support Vector Machine and Random Forest and molecular descriptors. Models have been internally and externally validated following internationally recognized guidelines, especially the OECD principles. The models are designed to be used as valid alternative to experimental testing and data-gap filling under the REACH regulation. New models possess several advantages over already existing ones: (i) noticeable larger training sets; (ii) external validation on a significant number of compounds coming from the Industrial context (Solvay portfolio); (iii) better accuracy and extended applicability domain

    Comprehensive data analysis and predictive chemoinformatics models for REACH related physicochemical and (eco)toxicity properties

    Full text link
    Cette thèse concerne la modélisation de propriétés environnementales et (éco)-toxicologiques pertinentes dans le cadre du règlement de l'Union Européenne sur l'enregistrement, l'évaluation, l'autorisation et la restriction des substances chimiques (REACH, CE n ° 1907/2006). Des modèles statistiques ont été générés à l'aide de méthodes d'apprentissage automatique, telles que les Séparateurs à Vaste Marge (SVM) ou les Forêts Aléatoires (Random Forest), et des descripteurs moléculaires. Les modèles sont conçus pour être utilisés comme une alternative crédible aux tests expérimentaux et pour compléter les données manquantes dans le cadre du règlement REACH. Les nouveaux modèles présentent plusieurs avantages par rapport aux modèles existants: (i) ils sont construits sur des ensembles de données sensiblement plus grands; (ii) ils sont validés sur des données externes de tailles significatives composés d’exemples issus d’un contexte industriel (l’entreprise Solvay); (iii) la précision des modèles est améliorée et leurs domaines d'applicabilité sont étendus.This thesis concerns the modelling of several environmental fate and (eco)toxicological properties relevant under the European Union Registration, Evaluation, Authorisation and Restriction of Chemical Substances Regulation (REACH, EC No 1907/2006). Statistical models have been generated using state-of-the-art machine learning methods, such Support Vector Machine and Random Forest and molecular descriptors. Models have been internally and externally validated following internationally recognized guidelines, especially the OECD principles. The models are designed to be used as valid alternative to experimental testing and data-gap filling under the REACH regulation. New models possess several advantages over already existing ones: (i) noticeable larger training sets; (ii) external validation on a significant number of compounds coming from the Industrial context (Solvay portfolio); (iii) better accuracy and extended applicability domain

    Analyse exhaustive et modèles chémoinformatiques prédictifs des données physicochimiques et (éco)toxicologiques concernées par REACH

    Full text link
    This thesis concerns the modelling of several environmental fate and (eco)toxicological properties relevant under the European Union Registration, Evaluation, Authorisation and Restriction of Chemical Substances Regulation (REACH, EC No 1907/2006). Statistical models have been generated using state-of-the-art machine learning methods, such Support Vector Machine and Random Forest and molecular descriptors. Models have been internally and externally validated following internationally recognized guidelines, especially the OECD principles. The models are designed to be used as valid alternative to experimental testing and data-gap filling under the REACH regulation. New models possess several advantages over already existing ones: (i) noticeable larger training sets; (ii) external validation on a significant number of compounds coming from the Industrial context (Solvay portfolio); (iii) better accuracy and extended applicability domain.Cette thèse concerne la modélisation de propriétés environnementales et (éco)-toxicologiques pertinentes dans le cadre du règlement de l'Union Européenne sur l'enregistrement, l'évaluation, l'autorisation et la restriction des substances chimiques (REACH, CE n ° 1907/2006). Des modèles statistiques ont été générés à l'aide de méthodes d'apprentissage automatique, telles que les Séparateurs à Vaste Marge (SVM) ou les Forêts Aléatoires (Random Forest), et des descripteurs moléculaires. Les modèles sont conçus pour être utilisés comme une alternative crédible aux tests expérimentaux et pour compléter les données manquantes dans le cadre du règlement REACH. Les nouveaux modèles présentent plusieurs avantages par rapport aux modèles existants: (i) ils sont construits sur des ensembles de données sensiblement plus grands; (ii) ils sont validés sur des données externes de tailles significatives composés d’exemples issus d’un contexte industriel (l’entreprise Solvay); (iii) la précision des modèles est améliorée et leurs domaines d'applicabilité sont étendus

    Comparison of structure and ligand-based classification models for hERG liability profiling

    Full text link
    The human ether-Ă -go-go-related potassium channel (hERG) is a voltage-gated potassium channel involved in the repolarization of the cardiac action potential. The off-target inhibition of hERG is the most frequent cause of drug-induced cardiotoxicity. Therefore, assessing hERG related cardiotoxicity in the early phase of the drug discovery process is crucial to avoid undesired cardiotoxic effects. For this purpose, we developed several machine learning classification models for hERG liability profiling basing on Random Forest algorithm by means of Weka software. The models were trained on a dataset of molecules collected from the public repository ChEMBL (https://www.ebi.ac.uk/chembl/) and the commercial GOSTAR database (https://www.gostardb.com/). The training molecules were encoded by both ligand- and structure-based attributes. The former consist of a set of physicochemical descriptors and fingerprints computed by RDKit node available in KNIME, while the latter comprise different scores obtained by docking and rescoring calculations performed by LiGen and Rescore+ tools, respectively. The following models are made available: hERG_LB, trained on ligand-based descriptors hERG_LiGen_AV, trained on a set of scores computed on the docking poses yielded by LiGen, considering for each score the mean value over all the computed poses. hERG_LiGen_AV-LB, trained on the combination of the descriptors used to build hERG_LB and hERG_LiGen_AV-LB models. The input datasets used for the models training and evaluation are made available too

    Accuracy Evaluation of Three Modelling Tools for Occupational Exposure Assessment

    Full text link
    The objective of this study is to evaluate the accuracy and robustness of three exposure-modelling tools [STOFFENMANAGER\uae v.6, European Centre for Ecotoxicology and Toxicology of Chemical Target Risk Assessment v.3.1 (ECETOC TRA v.3.1), and Advanced REACH Tool (ART v.1.5)], by comparing available measured data for exposure to organic solvents and pesticides in occupational exposure scenarios (ESs)

    “DompeKeys”: a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases

    Full text link
    Abstract The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed—as integral part of EXSCALATE, Dompé’s end-to-end drug discovery platform—the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds’ activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications

    Addressing docking pose selection with structure-based deep learning: Recent advances, challenges and opportunities

    Full text link
    Molecular docking is a widely used technique in drug discovery to predict the binding mode of a given ligand to its target. However, the identification of the near-native binding pose in docking experiments still represents a challenging task as the scoring functions currently employed by docking programs are parametrized to predict the binding affinity, and, therefore, they often fail to correctly identify the ligand native binding conformation. Selecting the correct binding mode is crucial to obtaining meaningful results and to conveniently optimizing new hit compounds. Deep learning (DL) algorithms have been an area of a growing interest in this sense for their capability to extract the relevant information directly from the protein-ligand structure. Our review aims to present the recent advances regarding the development of DL-based pose selection approaches, discussing limitations and possible future directions. Moreover, a comparison between the performances of some classical scoring functions and DL-based methods concerning their ability to select the correct binding mode is reported. In this regard, two novel DL-based pose selectors developed by us are presented
    corecore