3,091 research outputs found
Predicting Skin Permeability by means of Computational Approaches : Reliability and Caveats in Pharmaceutical Studies
© 2019 American Chemical Society.The skin is the main barrier between the internal body environment and the external one. The characteristics of this barrier and its properties are able to modify and affect drug delivery and chemical toxicity parameters. Therefore, it is not surprising that permeability of many different compounds has been measured through several in vitro and in vivo techniques. Moreover, many different in silico approaches have been used to identify the correlation between the structure of the permeants and their permeability, to reproduce the skin behavior, and to predict the ability of specific chemicals to permeate this barrier. A significant number of issues, like interlaboratory variability, experimental conditions, data set building rationales, and skin site of origin and hydration, still prevent us from obtaining a definitive predictive skin permeability model. This review wants to show the main advances and the principal approaches in computational methods used to predict this property, to enlighten the main issues that have arisen, and to address the challenges to develop in future research.Peer reviewedFinal Accepted Versio
Selection by Prediction with Conformal p-values
Decision making or scientific discovery pipelines such as job hiring and drug
discovery often involve multiple stages: before any resource-intensive step,
there is often an initial screening that uses predictions from a machine
learning model to shortlist a few candidates from a large pool. We study
screening procedures that aim to select candidates whose unobserved outcomes
exceed user-specified values. We develop a method that wraps around any
prediction model to produce a subset of candidates while controlling the
proportion of falsely selected units. Building upon the conformal inference
framework, our method first constructs p-values that quantify the statistical
evidence for large outcomes; it then determines the shortlist by comparing the
p-values to a threshold introduced in the multiple testing literature. In many
cases, the procedure selects candidates whose predictions are above a
data-dependent threshold. Our theoretical guarantee holds under mild
exchangeability conditions on the samples, generalizing existing results on
multiple conformal p-values. We demonstrate the empirical performance of our
method via simulations, and apply it to job hiring and drug discovery datasets.Comment: Journal of Machine Learning Researc
Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning
Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox
Conformal Drug Property Prediction with Density Estimation under Covariate Shift
In drug discovery, it is vital to confirm the predictions of pharmaceutical
properties from computational models using costly wet-lab experiments. Hence,
obtaining reliable uncertainty estimates is crucial for prioritizing drug
molecules for subsequent experimental validation. Conformal Prediction (CP) is
a promising tool for creating such prediction sets for molecular properties
with a coverage guarantee. However, the exchangeability assumption of CP is
often challenged with covariate shift in drug discovery tasks: Most datasets
contain limited labeled data, which may not be representative of the vast
chemical space from which molecules are drawn. To address this limitation, we
propose a method called CoDrug that employs an energy-based model leveraging
both training data and unlabelled data, and Kernel Density Estimation (KDE) to
assess the densities of a molecule set. The estimated densities are then used
to weigh the molecule samples while building prediction sets and rectifying for
distribution shift. In extensive experiments involving realistic distribution
drifts in various small-molecule drug discovery tasks, we demonstrate the
ability of CoDrug to provide valid prediction sets and its utility in
addressing the distribution shift arising from de novo drug design models. On
average, using CoDrug can reduce the coverage gap by over 35% when compared to
conformal prediction sets not adjusted for covariate shift.Comment: Accepted at NeurIPS 202
Using Predicted Bioactivity Profiles to Improve Predictive Modeling
Predictive modeling is a cornerstone in early drug development. Using information for multiple domains or across prediction tasks has the potential to improve the performance of predictive modeling. However, aggregating data often leads to incomplete data matrices that might be limiting for modeling. In line with previous studies, we show that by generating predicted bioactivity profiles, and using these as additional features, prediction accuracy of biological endpoints can be improved. Using conformal prediction, a type of confidence predictor, we present a robust framework for the calculation of these profiles and the evaluation of their impact. We report on the outcomes from several approaches to generate the predicted profiles on 16 datasets in cytotoxicity and bioactivity and show that efficiency is improved the most when including the p-values from conformal prediction as bioactivity profiles
- …