Search CORE

40 research outputs found

Curriculum Adversarial Training

Author: Cai Qi-Zhi
Du Min
Liu Chang
Song Dawn
Publication venue
Publication date: 12/05/2018
Field of study

Recently, deep learning has been applied to many security-sensitive applications, such as facial authentication. The existence of adversarial examples hinders such applications. The state-of-the-art result on defense shows that adversarial training can be applied to train a robust model on MNIST against adversarial examples; but it fails to achieve a high empirical worst-case accuracy on a more complex task, such as CIFAR-10 and SVHN. In our work, we propose curriculum adversarial training (CAT) to resolve this issue. The basic idea is to develop a curriculum of adversarial examples generated by attacks with a wide range of strengths. With two techniques to mitigate the forgetting and the generalization issues, we demonstrate that CAT can improve the prior art's empirical worst-case accuracy by a large margin of 25% on CIFAR-10 and 35% on SVHN. At the same, the model's performance on non-adversarial inputs is comparable to the state-of-the-art models.Comment: IJCAI 201

arXiv.org e-Print Archive

Defending Against Physically Realizable Attacks on Image Classification

Author: Tong Liang
Vorobeychik Yevgeniy
Wu Tong
Publication venue
Publication date: 14/02/2020
Field of study

We study the problem of defending deep neural network approaches for image classification from physically realizable attacks. First, we demonstrate that the two most scalable and effective methods for learning robust models, adversarial training with PGD attacks and randomized smoothing, exhibit very limited effectiveness against three of the highest profile physical attacks. Next, we propose a new abstract adversarial model, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and develop two approaches for efficiently computing the resulting adversarial examples. Finally, we demonstrate that adversarial training using our new attack yields image classification models that exhibit high robustness against the physically realizable attacks we study, offering the first effective generic defense against such attacks.Comment: camera-read

arXiv.org e-Print Archive

An Optimal Control View of Adversarial Machine Learning

Author: Zhu Xiaojin
Publication venue
Publication date: 11/11/2018
Field of study

I describe an optimal control view of adversarial machine learning, where the dynamical system is the machine learner, the input are adversarial actions, and the control costs are defined by the adversary's goals to do harm and be hard to detect. This view encompasses many types of adversarial machine learning, including test-item attacks, training-data poisoning, and adversarial reward shaping. The view encourages adversarial machine learning researcher to utilize advances in control theory and reinforcement learning

arXiv.org e-Print Archive

SAT: Improving Adversarial Training via Curriculum-Based Loss Smoothing

Author: Chakraborty Supriyo
Sitawarin Chawin
Wagner David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/11/2021
Field of study

Adversarial training (AT) has become a popular choice for training robust networks. However, it tends to sacrifice clean accuracy heavily in favor of robustness and suffers from a large generalization error. To address these concerns, we propose Smooth Adversarial Training (SAT), guided by our analysis on the eigenspectrum of the loss Hessian. We find that curriculum learning, a scheme that emphasizes on starting "easy" and gradually ramping up on the "difficulty" of training, smooths the adversarial loss landscape for a suitably chosen difficulty metric. We present a general formulation for curriculum learning in the adversarial setting and propose two difficulty metrics based on the maximal Hessian eigenvalue (H-SAT) and the softmax probability (P-SA). We demonstrate that SAT stabilizes network training even for a large perturbation norm and allows the network to operate at a better clean accuracy versus robustness trade-off curve compared to AT. This leads to a significant improvement in both clean accuracy and robustness compared to AT, TRADES, and other baselines. To highlight a few results, our best model improves normal and robust accuracy by 6% and 1% on CIFAR-100 compared to AT, respectively. On Imagenette, a ten-class subset of ImageNet, our model outperforms AT by 23% and 3% on normal and robust accuracy respectively.Comment: Published at AISec '21: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security. ACM DL link: https://dl.acm.org/doi/abs/10.1145/3474369.348687

arXiv.org e-Print Archive

Recent Advances in Adversarial Training for Adversarial Robustness

Author: Bai Tao
Luo Jinqi
Wang Qian
Wen Bihan
Zhao Jun
Publication venue
Publication date: 23/02/2021
Field of study

Adversarial training is one of the most effective approaches defending against adversarial examples for deep learning models. Unlike other defense strategies, adversarial training aims to promote the robustness of models intrinsically. During the last few years, adversarial training has been studied and discussed from various aspects. A variety of improvements and developments of adversarial training are proposed, which were, however, neglected in existing surveys. For the first time in this survey, we systematically review the recent progress on adversarial training for adversarial robustness with a novel taxonomy. Then we discuss the generalization problems in adversarial training from three perspectives. Finally, we highlight the challenges which are not fully tackled and present potential future directions

arXiv.org e-Print Archive

Improving the affordability of robustness training for DNNs

Author: Dube Parijat
Gupta Sidharth
Verma Ashish
Publication venue
Publication date: 30/04/2020
Field of study

Projected Gradient Descent (PGD) based adversarial training has become one of the most prominent methods for building robust deep neural network models. However, the computational complexity associated with this approach, due to the maximization of the loss function when finding adversaries, is a longstanding problem and may be prohibitive when using larger and more complex models. In this paper we show that the initial phase of adversarial training is redundant and can be replaced with natural training which significantly improves the computational efficiency. We demonstrate that this efficiency gain can be achieved without any loss in accuracy on natural and adversarial test samples. We support our argument with insights on the nature of the adversaries and their relative strength during the training process. We show that our proposed method can reduce the training time by a factor of up to 2.5 with comparable or better model test accuracy and generalization on various strengths of adversarial attacks

arXiv.org e-Print Archive

Monge blunts Bayes: Hardness Results for Adversarial Training

Author: Cranko Zac
Menon Aditya Krishna
Nock Richard
Ong Cheng Soon
Shi Zhan
Walder Christian
Publication venue
Publication date: 07/05/2019
Field of study

The last few years have seen a staggering number of empirical studies of the robustness of neural networks in a model of adversarial perturbations of their inputs. Most rely on an adversary which carries out local modifications within prescribed balls. None however has so far questioned the broader picture: how to frame a resource-bounded adversary so that it can be severely detrimental to learning, a non-trivial problem which entails at a minimum the choice of loss and classifiers. We suggest a formal answer for losses that satisfy the minimal statistical requirement of being proper. We pin down a simple sufficient property for any given class of adversaries to be detrimental to learning, involving a central measure of "harmfulness" which generalizes the well-known class of integral probability metrics. A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss. When classifiers are Lipschitz -- a now popular approach in adversarial training --, this optimisation resorts to optimal transport to make a low-budget compression of class marginals. Toy experiments reveal a finding recently separately observed: training against a sufficiently budgeted adversary of this kind improves generalization

arXiv.org e-Print Archive

Bag of Tricks for Adversarial Training

Author: Dong Yinpeng
Pang Tianyu
Su Hang
Yang Xiao
Zhu Jun
Publication venue
Publication date: 31/03/2021
Field of study

Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic settings (e.g., weight decay, training schedule, etc.) used in these methods are highly inconsistent. In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. We conclude a baseline training setting and re-implement previous defenses to achieve new state-of-the-art results. These facts also appeal to more concerns on the overlooked confounders when benchmarking defenses.Comment: ICLR 202

arXiv.org e-Print Archive

Towards Robust General Medical Image Segmentation

Author: Arbeláez Pablo
Daza Laura
Pérez Juan C.
Publication venue
Publication date: 09/07/2021
Field of study

The reliability of Deep Learning systems depends on their accuracy but also on their robustness against adversarial perturbations to the input data. Several attacks and defenses have been proposed to improve the performance of Deep Neural Networks under the presence of adversarial noise in the natural image domain. However, robustness in computer-aided diagnosis for volumetric data has only been explored for specific tasks and with limited attacks. We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are two-fold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG). Our results show that ROG is capable of generalizing across different tasks of the MSD and largely surpasses the state-of-the-art under sophisticated adversarial attacks.Comment: Accepted at MICCAI 202

arXiv.org e-Print Archive

Robust Face Verification via Disentangled Representations

Author: Arvinte Marius
Tewfik Ahmed H.
Vishwanath Sriram
Publication venue
Publication date: 23/06/2020
Field of study

We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss termand a disentangled generative model to sample negative pairs. Instead of randomlypairing two real images, we pair an image with its class-modified counterpart whilekeeping its content (pose, head tilt, hair, etc.) intact. This enables us to efficientlysample hard negative pairs for the contrastive loss. We experimentally show that, when coupled with adversarial training, the proposed scheme converges with aweak inner solver and has a higher clean and robust accuracy than state-of-the-art-methods when evaluated against white-box physical attacks.Comment: Preprin

arXiv.org e-Print Archive