17 research outputs found

    Consideration of predicted small-molecule metabolites in computational toxicology

    Get PDF
    Xenobiotic metabolism has evolved as a key protective system of organisms against potentially harmful chemicals or compounds typically not present in a particular organism. The system's primary purpose is to chemically transform xenobiotics into metabolites that can be excreted via renal or biliary routes. However, in a minority of cases, the metabolites formed are toxic, sometimes even more toxic than the parent compound. Therefore, the consideration of xenobiotic metabolism clearly is of importance to the understanding of the toxicity of a compound. Nevertheless, most of the existing computational approaches for toxicity prediction do not explicitly take metabolism into account and it is currently not known to what extent the consideration of (predicted) metabolites could lead to an improvement of toxicity prediction. In order to study how predictive metabolism could help to enhance toxicity prediction, we explored a number of different strategies to integrate predictions from a state-of-the-art metabolite structure predictor and from modern machine learning approaches for toxicity prediction. We tested the integrated models on five toxicological endpoints and assays, including in vitro and in vivo genotoxicity assays (AMES and MNT), two organ toxicity endpoints (DILI and DICC) and a skin sensitization assay (LLNA). Overall, the improvements in model performance achieved by including metabolism data were minor (up to +0.04 in the F1 scores and up to +0.06 in MCCs). In general, the best performance was obtained by averaging the probability of toxicity predicted for the parent compound and the maximum probability of toxicity predicted for any metabolite. Moreover, including metabolite structures as further input molecules for model training slightly improved the toxicity predictions obtained by this averaging approach. However, the high complexity of the metabolic system and associated uncertainty about the likely metabolites apparently limits the benefit of considering predicted metabolites in toxicity prediction

    Consideration of predicted small-molecule metabolites in computational toxicology

    No full text
    Exploration of computational approaches for including metabolism information in machine learning models for toxicity prediction.</jats:p

    Predicting the Mitochondrial Toxicity of Small Molecules: Insights from Mechanistic Assays and Cell Painting Data

    No full text
    Mitochondrial toxicity is a significant concern in the drug discovery process, as compounds that disrupt the function of these organelles can lead to serious side effects, including liver injury and cardiotoxicity. Different in vitro assays exist to detect mitochondrial toxicity at varying mechanistic levels: disruption of the respiratory chain, disruption of the membrane potential, or general mitochondrial dysfunction. In parallel, whole cell imaging assays like Cell Painting provide a phenotypic overview of the cellular system upon treatment and enable the assessment of mitochondrial health from cell profiling features. In this study, we aim to establish machine learning models for the prediction of mitochondrial toxicity, making the best use of the available data. For this purpose, we first derived highly curated datasets of mitochondrial toxicity, including subsets for different mechanisms of action. Due to the limited amount of labeled data often associated with toxicological endpoints, we investigated the potential of using morphological features from a large Cell Painting screen to label additional compounds and enrich our dataset. Our results suggest that models incorporating morphological profiles perform better in predicting mitochondrial toxicity than those trained on chemical structures alone (up to +0.08 and +0.09 mean MCC in random and cluster cross-validation, respectively). Toxicity labels derived from Cell Painting images improved the predictions on an external test set up to +0.08 MCC. However, we also found that further research is needed to improve the reliability of Cell Painting image labeling. Overall, our study provides insights into the importance of considering different mechanisms of action when predicting a complex endpoint like mitochondrial disruption as well as into the challenges and opportunities of using Cell Painting data for toxicity prediction

    Predicting the Mitochondrial Toxicity of Small Molecules: Insights from Mechanistic Assays and Cell Painting Data

    No full text
    Mitochondrial toxicity is a significant concern in the drug discovery process, as compounds that disrupt the function of these organelles can lead to serious side effects, including liver injury and cardiotoxicity. Different in vitro assays exist to detect mitochondrial toxicity at varying mechanistic levels: disruption of the respiratory chain, disruption of the membrane potential, or general mitochondrial dysfunction. In parallel, whole cell imaging assays like Cell Painting provide a phenotypic overview of the cellular system upon treatment and enable the assessment of mitochondrial health from cell profiling features. In this study, we aim to establish machine learning models for the prediction of mitochondrial toxicity, making the best use of the available data. For this purpose, we first derived highly curated datasets of mitochondrial toxicity, including subsets for different mechanisms of action. Due to the limited amount of labeled data often associated with toxicological endpoints, we investigated the potential of using morphological features from a large Cell Painting screen to label additional compounds and enrich our dataset. Our results suggest that models incorporating morphological profiles perform better in predicting mitochondrial toxicity than those trained on chemical structures alone (up to +0.08 and +0.09 mean MCC in random and cluster cross-validation, respectively). Toxicity labels derived from Cell Painting images improved the predictions on an external test set up to +0.08 MCC. However, we also found that further research is needed to improve the reliability of Cell Painting image labeling. Overall, our study provides insights into the importance of considering different mechanisms of action when predicting a complex endpoint like mitochondrial disruption as well as into the challenges and opportunities of using Cell Painting data for toxicity prediction

    Characterization of the Chemical Space of Known and Readily Obtainable Natural Products

    No full text
    Natural products remain one of the most productive sources of chemical inspiration for the development of new drugs. The structures of more than 250 000 natural products are available from public databases. At least 10% of these compounds are readily obtainable for experimental testing from commercial vendors and public research institutions. While the physicochemical properties of known natural products have been thoroughly studied and compared to those of drugs and other types of small molecules, the information available on the content, coverage, and relevance of individual virtual and physical natural product libraries is clearly limited. The aim of this study was the development of a detailed understanding of the coverage of chemical space by known and readily obtainable natural products and by individual natural product databases. For this purpose, we compiled comprehensive data sets of known and readily obtainable natural products from 18 virtual databases (including the Dictionary of Natural Products), nine physical libraries, and the Protein Data Bank (PDB). We also developed and employed an algorithm (“SugarBuster”) for the removal of sugars and sugar-like moieties, which are generally not in the focus of interest for drug discovery, from natural products. In addition, we devised a rule-based approach for the automated classification of natural products into natural product classes (alkaloids, steroids, flavonoids, etc.). Among the most important results of this study is the finding that the readily obtainable natural products are highly diverse and populate regions of chemical space that are of high relevance to drug discovery. In some cases, substantial differences in the coverage of natural product classes and chemical space by the individual databases are observed. More than 2000 natural products are identified for which at least one X-ray crystal structure of the compound in complex with a biomacromolecule is available from the PDB

    Studying and Mitigating the Effects of Data Drifts on ML Model Performance at the Example of Chemical Toxicity Data

    No full text
    Abstract Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.</jats:p

    Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

    No full text
    AbstractMachine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.</jats:p

    Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

    No full text
    Abstract Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.</jats:p

    Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

    Get PDF
    Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models
    corecore