353 research outputs found
Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning
Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox
Conformal prediction for frequency-severity modeling
We present a nonparametric model-agnostic framework for building prediction
intervals of insurance claims, with finite sample statistical guarantees,
extending the technique of split conformal prediction to the domain of
two-stage frequency-severity modeling. The effectiveness of the framework is
showcased with simulated and real datasets. When the underlying severity model
is a random forest, we extend the two-stage split conformal prediction
procedure, showing how the out-of-bag mechanism can be leveraged to eliminate
the need for a calibration set and to enable the production of prediction
intervals with adaptive width
KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development
Risk assessment of newly synthesised chemicals is a prerequisite for regulatory approval. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data. Here, KnowTox is presented, a novel pipeline that combines three different in silico toxicology approaches to allow for confident prediction of potentially toxic effects of query compounds, i.e. machine learning models for 88 endpoints, alerts for 919 toxic substructures, and computational support for read-across. It is mainly based on the ToxCast dataset, containing after preprocessing a sparse matrix of 7912 compounds tested against 985 endpoints. When applying machine learning models, applicability and reliability of predictions for new chemicals are of utmost importance. Therefore, first, the conformal prediction technique was deployed, comprising an additional calibration step and per definition creating internally valid predictors at a given significance level. Second, to further improve validity and information efficiency, two adaptations are suggested, exemplified at the androgen receptor antagonism endpoint. An absolute increase in validity of 23% on the in-house dataset of 534 compounds could be achieved by introducing KNNRegressor normalisation. This increase in validity comes at the cost of efficiency, which could again be improved by 20% for the initial ToxCast model by balancing the dataset during model training. Finally, the value of the developed pipeline for risk assessment is discussed using two in-house triazole molecules. Compared to a single toxicity prediction method, complementing the outputs of different approaches can have a higher impact on guiding toxicity testing and de-selecting most likely harmful development-candidate compounds early in the development process
Deep Learning-Based Conformal Prediction of Toxicity
Predictive modeling for toxicity can help reduce risks in a range of applications and potentially serve as the basis for regulatory decisions. However, the utility of these predictions can be limited if the associated uncertainty is not adequately quantified. With recent studies showing great promise for deep learning-based models also for toxicity predictions, we investigate the combination of deep learning-based predictors with the conformal prediction framework to generate highly predictive models with well-defined uncertainties. We use a range of deep feedforward neural networks and graph neural networks in a conformal prediction setting and evaluate their performance on data from the Tox21 challenge. We also compare the results from the conformal predictors to those of the underlying machine learning models. The results indicate that highly predictive models can be obtained that result in very efficient conformal predictors even at high confidence levels. Taken together, our results highlight the utility of conformal predictors as a convenient way to deliver toxicity predictions with confidence, adding both statistical guarantees on the model performance as well as better predictions of the minority class compared to the underlying models
Computationally efficient versions of conformal predictive distributions
Conformal predictive systems are a recent modification of conformal
predictors that output, in regression problems, probability distributions for
labels of test observations rather than set predictions. The extra information
provided by conformal predictive systems may be useful, e.g., in decision
making problems. Conformal predictive systems inherit the relative
computational inefficiency of conformal predictors. In this paper we discuss
two computationally efficient versions of conformal predictive systems, which
we call split conformal predictive systems and cross-conformal predictive
systems. The main advantage of split conformal predictive systems is their
guaranteed validity, whereas for cross-conformal predictive systems validity
only holds empirically and in the absence of excessive randomization. The main
advantage of cross-conformal predictive systems is their greater predictive
efficiency.Comment: 31 pages, 14 figures, 1 table. The conference version published in
the Proceedings of COPA 2018, and the journal version is to appear in
Neurocomputin
Constructing Prediction Intervals with Neural Networks: An Empirical Evaluation of Bootstrapping and Conformal Inference Methods
Artificial neural networks (ANNs) are popular tools for accomplishing many machine learning tasks, including predicting continuous outcomes. However, the general lack of confidence measures provided with ANN predictions limit their applicability, especially in military settings where accuracy is paramount. Supplementing point predictions with prediction intervals (PIs) is common for other learning algorithms, but the complex structure and training of ANNs renders constructing PIs difficult. This work provides the network design choices and inferential methods for creating better performing PIs with ANNs to enable their adaptation for military use. A two-step experiment is executed across 11 datasets, including an imaged-based dataset. Two non-parametric methods for constructing PIs, bootstrapping and conformal inference, are considered. The results of the first experimental step reveal that the choices inherent to building an ANN affect PI performance. Guidance is provided for optimizing PI performance with respect to each network feature and PI method. In the second step, 20 algorithms for constructing PIs—each using the principles of bootstrapping or conformal inference—are implemented to determine which provides the best performance while maintaining reasonable computational burden. In general, this trade-off is optimized when implementing the cross-conformal method, which maintained interval coverage and efficiency with decreased computational burden
- …