Search CORE

17 research outputs found

Consideration of predicted small-molecule metabolites in computational toxicology

Author: Garcia de Lomana Marina
Kirchmair Johannes
Mathea Miriam
Svensson Fredrik
Volkamer Andrea
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 23/02/2022
Field of study

Xenobiotic metabolism has evolved as a key protective system of organisms against potentially harmful chemicals or compounds typically not present in a particular organism. The system's primary purpose is to chemically transform xenobiotics into metabolites that can be excreted via renal or biliary routes. However, in a minority of cases, the metabolites formed are toxic, sometimes even more toxic than the parent compound. Therefore, the consideration of xenobiotic metabolism clearly is of importance to the understanding of the toxicity of a compound. Nevertheless, most of the existing computational approaches for toxicity prediction do not explicitly take metabolism into account and it is currently not known to what extent the consideration of (predicted) metabolites could lead to an improvement of toxicity prediction. In order to study how predictive metabolism could help to enhance toxicity prediction, we explored a number of different strategies to integrate predictions from a state-of-the-art metabolite structure predictor and from modern machine learning approaches for toxicity prediction. We tested the integrated models on five toxicological endpoints and assays, including in vitro and in vivo genotoxicity assays (AMES and MNT), two organ toxicity endpoints (DILI and DICC) and a skin sensitization assay (LLNA). Overall, the improvements in model performance achieved by including metabolism data were minor (up to +0.04 in the F1 scores and up to +0.06 in MCCs). In general, the best performance was obtained by averaging the probability of toxicity predicted for the parent compound and the maximum probability of toxicity predicted for any metabolite. Moreover, including metabolite structures as further input molecules for model training slightly improved the toxicity predictions obtained by this averaging approach. However, the high complexity of the metabolic system and associated uncertainty about the likely metabolites apparently limits the benefit of considering predicted metabolites in toxicity prediction

UCL Discovery

Novel clinical phenotypes, drug categorization, and outcome prediction in drug-induced cholestasis: Analysis of a database of 432 patients developed by literature review and machine learning support

Author: Dirven Hubert
Drees Annika
Gadaleta Domenico
Garcia de Lomana Marina
Jover Ramiro
Luechtefeld Thomas
López-Pascual Ernesto
Moreno-Torres Marta
Quintás Guillermo
Rapisarda Anna
Serrano-Candelas Eva
Steffensen Inger-Lise
Vinken Mathieu
Publication venue
Publication date: 01/01/2024
Field of study

publishedVersio

Folkehelseinstituttet

Consideration of predicted small-molecule metabolites in computational toxicology

Author: Andrea Volkamer
Fredrik Svensson
Johannes Kirchmair
Marina Garcia de Lomana
Miriam Mathea
Publication venue: Royal Society of Chemistry (RSC)
Publication date: 01/01/2022
Field of study

Exploration of computational approaches for including metabolism information in machine learning models for toxicity prediction.</jats:p

Crossref

Predicting the Mitochondrial Toxicity of Small Molecules: Insights from Mechanistic Assays and Cell Painting Data

Author: Floriane Montanari (205156)
Marina Garcia de Lomana (16500984)
Paula Andrea Marin Zapata (16500987)
Publication venue
Publication date: 06/07/2023
Field of study

Mitochondrial toxicity is a significant concern in the drug discovery process, as compounds that disrupt the function of these organelles can lead to serious side effects, including liver injury and cardiotoxicity. Different in vitro assays exist to detect mitochondrial toxicity at varying mechanistic levels: disruption of the respiratory chain, disruption of the membrane potential, or general mitochondrial dysfunction. In parallel, whole cell imaging assays like Cell Painting provide a phenotypic overview of the cellular system upon treatment and enable the assessment of mitochondrial health from cell profiling features. In this study, we aim to establish machine learning models for the prediction of mitochondrial toxicity, making the best use of the available data. For this purpose, we first derived highly curated datasets of mitochondrial toxicity, including subsets for different mechanisms of action. Due to the limited amount of labeled data often associated with toxicological endpoints, we investigated the potential of using morphological features from a large Cell Painting screen to label additional compounds and enrich our dataset. Our results suggest that models incorporating morphological profiles perform better in predicting mitochondrial toxicity than those trained on chemical structures alone (up to +0.08 and +0.09 mean MCC in random and cluster cross-validation, respectively). Toxicity labels derived from Cell Painting images improved the predictions on an external test set up to +0.08 MCC. However, we also found that further research is needed to improve the reliability of Cell Painting image labeling. Overall, our study provides insights into the importance of considering different mechanisms of action when predicting a complex endpoint like mitochondrial disruption as well as into the challenges and opportunities of using Cell Painting data for toxicity prediction

The Francis Crick Institute

Predicting the Mitochondrial Toxicity of Small Molecules: Insights from Mechanistic Assays and Cell Painting Data

Author: Floriane Montanari (205156)
Marina Garcia de Lomana (16500984)
Paula Andrea Marin Zapata (16500987)
Publication venue
Publication date: 06/07/2023
Field of study

The Francis Crick Institute

Characterization of the Chemical Space of Known and Readily Obtainable Natural Products

Author: Johannes Kirchmair (490109)
Marina Garcia de Lomana (5578643)
Nils-Ole Friedrich (2807755)
Ya Chen (1437550)
Publication venue
Publication date
Field of study

Natural products remain one of the most productive sources of chemical inspiration for the development of new drugs. The structures of more than 250 000 natural products are available from public databases. At least 10% of these compounds are readily obtainable for experimental testing from commercial vendors and public research institutions. While the physicochemical properties of known natural products have been thoroughly studied and compared to those of drugs and other types of small molecules, the information available on the content, coverage, and relevance of individual virtual and physical natural product libraries is clearly limited. The aim of this study was the development of a detailed understanding of the coverage of chemical space by known and readily obtainable natural products and by individual natural product databases. For this purpose, we compiled comprehensive data sets of known and readily obtainable natural products from 18 virtual databases (including the Dictionary of Natural Products), nine physical libraries, and the Protein Data Bank (PDB). We also developed and employed an algorithm (“SugarBuster”) for the removal of sugars and sugar-like moieties, which are generally not in the focus of interest for drug discovery, from natural products. In addition, we devised a rule-based approach for the automated classification of natural products into natural product classes (alkaloids, steroids, flavonoids, etc.). Among the most important results of this study is the finding that the readily obtainable natural products are highly diverse and populate regions of chemical space that are of high relevance to drug discovery. In some cases, substantial differences in the coverage of natural product classes and chemical space by the individual databases are observed. More than 2000 natural products are identified for which at least one X-ray crystal structure of the compound in complex with a biomacromolecule is available from the PDB

Crossref

The Francis Crick Institute

Studying and Mitigating the Effects of Data Drifts on ML Model Performance at the Example of Chemical Toxicity Data

Author: Andrea Morger
Andrea Volkamer
Fredrik Svensson
Johannes Kirchmair
Marina Garcia de Lomana
Miriam Mathea
Ulf Norinder
Publication venue: Research Square Platform LLC
Publication date: 14/10/2021
Field of study

Abstract Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.</jats:p

Crossref

Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

Author: Andrea Morger
Andrea Volkamer
Fredrik Svensson
Johannes Kirchmair
Marina Garcia de Lomana
Miriam Mathea
Ulf Norinder
Publication venue: Springer Science and Business Media LLC
Publication date: 04/05/2022
Field of study

AbstractMachine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.</jats:p

Crossref

Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

Author: Andrea Morger
Andrea Volkamer
Fredrik Svensson
Johannes Kirchmair
Marina Garcia de Lomana
Miriam Mathea
Ulf Norinder
Publication venue: Research Square Platform LLC
Publication date: 08/03/2022
Field of study

Crossref

Studying and mitigating the effects of data drifts on ML model performance at the example of chemical toxicity data

Author: de Lomana Marina Garcia
Kirchmair Johannes
Mathea Miriam
Morger Andrea
Norinder Ulf
Svensson Fredrik
Volkamer Andrea
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models

Institutional Repository of the Freie Universität Berlin

Publikationer från Uppsala Universitet

PubMed Central

UCL Discovery

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swepub