4,773 research outputs found
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models
Causal Neural Network models have shown high levels of robustness to
adversarial attacks as well as an increased capacity for generalisation tasks
such as few-shot learning and rare-context classification compared to
traditional Neural Networks. This robustness is argued to stem from the
disentanglement of causal and confounder input signals. However, no
quantitative study has yet measured the level of disentanglement achieved by
these types of causal models or assessed how this relates to their adversarial
robustness.
Existing causal disentanglement metrics are not applicable to deterministic
models trained on real-world datasets. We, therefore, utilise metrics of
content/style disentanglement from the field of Computer Vision to measure
different aspects of the causal disentanglement for four state-of-the-art
causal Neural Network models. By re-implementing these models with a common
ResNet18 architecture we are able to fairly measure their adversarial
robustness on three standard image classification benchmarking datasets under
seven common white-box attacks. We find a strong association (r=0.820, p=0.001)
between the degree to which models decorrelate causal and confounder signals
and their adversarial robustness. Additionally, we find a moderate negative
association between the pixel-level information content of the confounder
signal and adversarial robustness (r=-0.597, p=0.040).Comment: 12 pages, 3 figure
RobustBench: a standardized adversarial robustness benchmark
As a research community, we are still lacking a systematic understanding of
the progress on adversarial robustness which often makes it hard to identify
the most promising ideas in training robust models. A key challenge in
benchmarking robustness is that its evaluation is often error-prone leading to
robustness overestimation. Our goal is to establish a standardized benchmark of
adversarial robustness, which as accurately as possible reflects the robustness
of the considered models within a reasonable computational budget. To this end,
we start by considering the image classification task and introduce
restrictions (possibly loosened in the future) on the allowed models. We
evaluate adversarial robustness with AutoAttack, an ensemble of white- and
black-box attacks, which was recently shown in a large-scale study to improve
almost all robustness evaluations compared to the original publications. To
prevent overadaptation of new defenses to AutoAttack, we welcome external
evaluations based on adaptive attacks, especially where AutoAttack flags a
potential overestimation of robustness. Our leaderboard, hosted at
https://robustbench.github.io/, contains evaluations of 120+ models and aims at
reflecting the current state of the art in image classification on a set of
well-defined tasks in - and -threat models and on common
corruptions, with possible extensions in the future. Additionally, we
open-source the library https://github.com/RobustBench/robustbench that
provides unified access to 80+ robust models to facilitate their downstream
applications. Finally, based on the collected models, we analyze the impact of
robustness on the performance on distribution shifts, calibration,
out-of-distribution detection, fairness, privacy leakage, smoothness, and
transferability.Comment: The camera-ready version accepted at the NeurIPS'21 Datasets and
Benchmarks Track: 120+ evaluations, 80+ models, 7 leaderboards (Linf, L2,
common corruptions; CIFAR-10, CIFAR-100, ImageNet), significantly expanded
analysis part (calibration, fairness, privacy leakage, smoothness,
transferability
Robustness Verification of Support Vector Machines
We study the problem of formally verifying the robustness to adversarial
examples of support vector machines (SVMs), a major machine learning model for
classification and regression tasks. Following a recent stream of works on
formal robustness verification of (deep) neural networks, our approach relies
on a sound abstract version of a given SVM classifier to be used for checking
its robustness. This methodology is parametric on a given numerical abstraction
of real values and, analogously to the case of neural networks, needs neither
abstract least upper bounds nor widening operators on this abstraction. The
standard interval domain provides a simple instantiation of our abstraction
technique, which is enhanced with the domain of reduced affine forms, which is
an efficient abstraction of the zonotope abstract domain. This robustness
verification technique has been fully implemented and experimentally evaluated
on SVMs based on linear and nonlinear (polynomial and radial basis function)
kernels, which have been trained on the popular MNIST dataset of images and on
the recent and more challenging Fashion-MNIST dataset. The experimental results
of our prototype SVM robustness verifier appear to be encouraging: this
automated verification is fast, scalable and shows significantly high
percentages of provable robustness on the test set of MNIST, in particular
compared to the analogous provable robustness of neural networks
When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks
Discovering and exploiting the causality in deep neural networks (DNNs) are
crucial challenges for understanding and reasoning causal effects (CE) on an
explainable visual model. "Intervention" has been widely used for recognizing a
causal relation ontologically. In this paper, we propose a causal inference
framework for visual reasoning via do-calculus. To study the intervention
effects on pixel-level features for causal reasoning, we introduce pixel-wise
masking and adversarial perturbation. In our framework, CE is calculated using
features in a latent space and perturbed prediction from a DNN-based model. We
further provide the first look into the characteristics of discovered CE of
adversarially perturbed images generated by gradient-based methods
\footnote{~~https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvImg}.
Experimental results show that CE is a competitive and robust index for
understanding DNNs when compared with conventional methods such as
class-activation mappings (CAMs) on the Chest X-Ray-14 dataset for
human-interpretable feature(s) (e.g., symptom) reasoning. Moreover, CE holds
promises for detecting adversarial examples as it possesses distinct
characteristics in the presence of adversarial perturbations.Comment: Noted our camera-ready version has changed the title. "When Causal
Intervention Meets Adversarial Examples and Image Masking for Deep Neural
Networks" as the v3 official paper title in IEEE Proceeding. Please use it in
your formal reference. Accepted at IEEE ICIP 2019. Pytorch code has released
on https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvIm
- …