8 research outputs found
Achieving Robustness in the Wild via Adversarial Mixing with Disentangled Representations
Recent research has made the surprising finding that state-of-the-art deep
learning models sometimes fail to generalize to small variations of the input.
Adversarial training has been shown to be an effective approach to overcome
this problem. However, its application has been limited to enforcing invariance
to analytically defined transformations like -norm bounded
perturbations. Such perturbations do not necessarily cover plausible real-world
variations that preserve the semantics of the input (such as a change in
lighting conditions). In this paper, we propose a novel approach to express and
formalize robustness to these kinds of real-world transformations of the input.
The two key ideas underlying our formulation are (1) leveraging disentangled
representations of the input to define different factors of variations, and (2)
generating new input images by adversarially composing the representations of
different images. We use a StyleGAN model to demonstrate the efficacy of this
framework. Specifically, we leverage the disentangled latent representations
computed by a StyleGAN model to generate perturbations of an image that are
similar to real-world variations (like adding make-up, or changing the
skin-tone of a person) and train models to be invariant to these perturbations.
Extensive experiments show that our method improves generalization and reduces
the effect of spurious correlations (reducing the error rate of a "smile"
detector by 21% for example).Comment: Accepted at CVPR 202
Decoder-free Robustness Disentanglement without (Additional) Supervision
Adversarial Training (AT) is proposed to alleviate the adversarial
vulnerability of machine learning models by extracting only robust features
from the input, which, however, inevitably leads to severe accuracy reduction
as it discards the non-robust yet useful features. This motivates us to
preserve both robust and non-robust features and separate them with
disentangled representation learning. Our proposed Adversarial Asymmetric
Training (AAT) algorithm can reliably disentangle robust and non-robust
representations without additional supervision on robustness. Empirical results
show our method does not only successfully preserve accuracy by combining two
representations, but also achieve much better disentanglement than previous
work
Training Generative Adversarial Networks by Solving Ordinary Differential Equations
The instability of Generative Adversarial Network (GAN) training has
frequently been attributed to gradient descent. Consequently, recent methods
have aimed to tailor the models and training procedures to stabilise the
discrete updates. In contrast, we study the continuous-time dynamics induced by
GAN training. Both theory and toy experiments suggest that these dynamics are
in fact surprisingly stable. From this perspective, we hypothesise that
instabilities in training GANs arise from the integration error in discretising
the continuous dynamics. We experimentally verify that well-known ODE solvers
(such as Runge-Kutta) can stabilise training - when combined with a regulariser
that controls the integration error. Our approach represents a radical
departure from previous methods which typically use adaptive optimisation and
stabilisation techniques that constrain the functional space (e.g. Spectral
Normalisation). Evaluation on CIFAR-10 and ImageNet shows that our method
outperforms several strong baselines, demonstrating its efficacy
Out-of-Distribution Generalization via Risk Extrapolation (REx)
Distributional shift is one of the major obstacles when transferring machine
learning prediction systems from the lab to the real world. To tackle this
problem, we assume that variation across training domains is representative of
the variation we might encounter at test time, but also that shifts at test
time may be more extreme in magnitude. In particular, we show that reducing
differences in risk across training domains can reduce a model's sensitivity to
a wide range of extreme distributional shifts, including the challenging
setting where the input contains both causal and anti-causal elements. We
motivate this approach, Risk Extrapolation (REx), as a form of robust
optimization over a perturbation set of extrapolated domains (MM-REx), and
propose a penalty on the variance of training risks (V-REx) as a simpler
variant. We prove that variants of REx can recover the causal mechanisms of the
targets, while also providing some robustness to changes in the input
distribution ("covariate shift"). By appropriately trading-off robustness to
causally induced distributional shifts and covariate shift, REx is able to
outperform alternative methods such as Invariant Risk Minimization in
situations where these types of shift co-occur
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
Convex relaxations have emerged as a promising approach for verifying
desirable properties of neural networks like robustness to adversarial
perturbations. Widely used Linear Programming (LP) relaxations only work well
when networks are trained to facilitate verification. This precludes
applications that involve verification-agnostic networks, i.e., networks not
specially trained for verification. On the other hand, semidefinite programming
(SDP) relaxations have successfully be applied to verification-agnostic
networks, but do not currently scale beyond small networks due to poor time and
space asymptotics. In this work, we propose a first-order dual SDP algorithm
that (1) requires memory only linear in the total number of network
activations, (2) only requires a fixed number of forward/backward passes
through the network per iteration. By exploiting iterative eigenvector methods,
we express all solver operations in terms of forward and backward passes
through the network, enabling efficient use of hardware like GPUs/TPUs. For two
verification-agnostic networks on MNIST and CIFAR-10, we significantly improve
L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We
also demonstrate tight verification of a quadratic stability specification for
the decoder of a variational autoencoder
Learning perturbation sets for robust machine learning
Although much progress has been made towards robust deep learning, a
significant gap in robustness remains between real-world perturbations and more
narrowly defined sets typically studied in adversarial defenses. In this paper,
we aim to bridge this gap by learning perturbation sets from data, in order to
characterize real-world effects for robust training and evaluation.
Specifically, we use a conditional generator that defines the perturbation set
over a constrained region of the latent space. We formulate desirable
properties that measure the quality of a learned perturbation set, and
theoretically prove that a conditional variational autoencoder naturally
satisfies these criteria. Using this framework, our approach can generate a
variety of perturbations at different complexities and scales, ranging from
baseline spatial transformations, through common image corruptions, to lighting
variations. We measure the quality of our learned perturbation sets both
quantitatively and qualitatively, finding that our models are capable of
producing a diverse set of meaningful perturbations beyond the limited data
seen during training. Finally, we leverage our learned perturbation sets to
train models which are empirically and certifiably robust to adversarial image
corruptions and adversarial lighting variations, while improving generalization
on non-adversarial data. All code and configuration files for reproducing the
experiments as well as pretrained model weights can be found at
https://github.com/locuslab/perturbation_learning
Model Patching: Closing the Subgroup Performance Gap with Data Augmentation
Classifiers in machine learning are often brittle when deployed. Particularly
concerning are models with inconsistent performance on specific subgroups of a
class, e.g., exhibiting disparities in skin cancer classification in the
presence or absence of a spurious bandage. To mitigate these performance
differences, we introduce model patching, a two-stage framework for improving
robustness that encourages the model to be invariant to subgroup differences,
and focus on class information shared by subgroups. Model patching first models
subgroup features within a class and learns semantic transformations between
them, and then trains a classifier with data augmentations that deliberately
manipulate subgroup features. We instantiate model patching with CAMEL, which
(1) uses a CycleGAN to learn the intra-class, inter-subgroup augmentations, and
(2) balances subgroup performance using a theoretically-motivated subgroup
consistency regularizer, accompanied by a new robust objective. We demonstrate
CAMEL's effectiveness on 3 benchmark datasets, with reductions in robust error
of up to 33% relative to the best baseline. Lastly, CAMEL successfully patches
a model that fails due to spurious features on a real-world skin cancer
dataset
Defending Against Image Corruptions Through Adversarial Augmentations
Modern neural networks excel at image classification, yet they remain
vulnerable to common image corruptions such as blur, speckle noise or fog.
Recent methods that focus on this problem, such as AugMix and DeepAugment,
introduce defenses that operate in expectation over a distribution of image
corruptions. In contrast, the literature on -norm bounded perturbations
focuses on defenses against worst-case corruptions. In this work, we reconcile
both approaches by proposing AdversarialAugment, a technique which optimizes
the parameters of image-to-image models to generate adversarially corrupted
augmented images. We theoretically motivate our method and give sufficient
conditions for the consistency of its idealized version as well as that of
DeepAugment. Our classifiers improve upon the state-of-the-art on common image
corruption benchmarks conducted in expectation on CIFAR-10-C and improve
worst-case performance against -norm bounded perturbations on both
CIFAR-10 and ImageNet