5 research outputs found
Detecting adversarial examples with inductive Venn-ABERS predictors
Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. We propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and significantly more robust to adversarial examples. The method appears to be competitive to the state of the art in adversarial defense, both in terms of robustness as well as scalabilit
Detecting adversarial manipulation using inductive Venn-ABERS predictors
Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. In this paper, we propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and resistant to adversarial examples. This robustness is observed on adversarials for the underlying model as well as adversarials that were generated by taking the IVAP into account. The method appears to offer competitive robustness compared to the state-of-the-art in adversarial defense yet it is computationally much more tractable
Calibration of Natural Language Understanding Models with Venn--ABERS Predictors
Transformers, currently the state-of-the-art in natural language
understanding (NLU) tasks, are prone to generate uncalibrated predictions or
extreme probabilities, making the process of taking different decisions based
on their output relatively difficult. In this paper we propose to build several
inductive Venn--ABERS predictors (IVAP), which are guaranteed to be well
calibrated under minimal assumptions, based on a selection of pre-trained
transformers. We test their performance over a set of diverse NLU tasks and
show that they are capable of producing well-calibrated probabilistic
predictions that are uniformly spread over the [0,1] interval -- all while
retaining the original model's predictive accuracy.Comment: Accepted at the 11th Symposium on Conformal and Probabilistic
Prediction with Applications - COPA 202