884 research outputs found
Towards Debugging and Improving Adversarial Robustness Evaluations โ
Despite exhibiting unprecedented success in many application domains, machineโlearning models have been shown to be vulnerable to adversarial examples, i.e., maliciously perturbed inputs that are able to subvert their predictions at test time. Rigorous testing against such perturbations requires enumerating all possible outputs for all possible inputs, and despite impressive results in this field, these methods remain still difficult to scale to modern deep learning systems. For these reasons, empirical methods are often used. These adversarial perturbations are optimized via gradient descent, minimizing a loss function that aims to increase the probability of misleading the modelโs predictions. To understand the sensitivity of the model to such attacks, and to counter the effects, machine-learning model designers craft worst-case adversarial perturbations and test them against the model they are evaluating. However, many of the proposed defenses have been shown to provide a false sense of security due to failures of the attacks, rather than actual improvements in the machineโlearning modelsโ robustness. They have been broken indeed under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic and automated manner. To this end, we tackle three different challenges: (1) we investigate how adversarial robustness evaluations can be performed efficiently, by proposing a novel attack that can be used to find minimum-norm adversarial perturbations; (2) we propose a framework for debugging adversarial robustness evaluations, by defining metrics that reveal faulty evaluations as well as mitigations to patch the detected problems; and (3) we show how to employ a surrogate model for improving the success of transfer-based attacks, that are useful when gradient-based attacks are failing due to problems in the gradient information. To improve the quality of robustness evaluations, we propose a novel attack, referred to as Fast MinimumโNorm (FMN) attack, which competes with stateโofโtheโart attacks in terms of quality of the solution while outperforming them in terms of computational complexity and robustness to subโoptimal configurations of the attack hyperparameters. These are all desirable characteristics of attacks used in robustness evaluations, as the aforementioned problems often arise from the use of subโoptimal attack hyperparameters, including, e.g., the number of attack iterations, the step size, and the use of an inappropriate loss function. The correct refinement of these variables is often neglected, hence we designed a novel framework that helps debug the optimization process of adversarial examples, by means of quantitative indicators that unveil common problems and failures during the attack optimization process, e.g., in the configuration of the hyperparameters. Commonly accepted best practices suggest further validating the target model with alternative strategies, among which is the usage of a surrogate model to craft the adversarial examples to transfer to the model being evaluated is useful to check for gradient obfuscation. However, how to effectively create transferable adversarial examples is not an easy process, as many factors influence the success of this strategy. In the context of this research, we utilize a first-order model to show what are the main underlying phenomena that affect transferability and suggest best practices to create adversarial examples that transfer well to the target models.
Gradient-free activation maximization for identifying effective stimuli
A fundamental question for understanding brain function is what types of
stimuli drive neurons to fire. In visual neuroscience, this question has also
been posted as characterizing the receptive field of a neuron. The search for
effective stimuli has traditionally been based on a combination of insights
from previous studies, intuition, and luck. Recently, the same question has
emerged in the study of units in convolutional neural networks (ConvNets), and
together with this question a family of solutions were developed that are
generally referred to as "feature visualization by activation maximization."
We sought to bring in tools and techniques developed for studying ConvNets to
the study of biological neural networks. However, one key difference that
impedes direct translation of tools is that gradients can be obtained from
ConvNets using backpropagation, but such gradients are not available from the
brain. To circumvent this problem, we developed a method for gradient-free
activation maximization by combining a generative neural network with a genetic
algorithm. We termed this method XDream (EXtending DeepDream with real-time
evolution for activation maximization), and we have shown that this method can
reliably create strong stimuli for neurons in the macaque visual cortex (Ponce
et al., 2019). In this paper, we describe extensive experiments characterizing
the XDream method by using ConvNet units as in silico models of neurons. We
show that XDream is applicable across network layers, architectures, and
training sets; examine design choices in the algorithm; and provide practical
guides for choosing hyperparameters in the algorithm. XDream is an efficient
algorithm for uncovering neuronal tuning preferences in black-box networks
using a vast and diverse stimulus space.Comment: 16 pages, 8 figures, 3 table
์๋ฏธ๋ณด์กด ์ ๋์ ํ์ต
ํ์๋
ผ๋ฌธ (์์ฌ) -- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021. 2. ์ด์๊ตฌ.Adversarial training is a defense technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data. In this paper, we identify an overlooked problem of adversarial training in that these adversarial examples often have different semantics than the original data, introducing unintended biases into the model. We hypothesize that such non-semantics-preserving (and resultingly ambiguous) adversarial data harm the robustness of the target models. To mitigate such unintended semantic changes of adversarial examples, we propose semantics-preserving adversarial
training (SPAT) which encourages perturbation on the pixels that are shared among all classes when generating adversarial examples in the training stage. Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10, CIFAR-100, and STL-10.์ ๋์ ํ์ต์ ์ ๋์ ์์ ๋ฅผ ํ์ต ๋ฐ์ดํฐ์ ํฌํจ์ํด์ผ๋ก์จ ์ฌ์ธต ์ ๊ฒฝ๋ง์ ์ ๋์ ๊ฐ๊ฑด์ฑ์ ๊ฐ์ ํ๋ ๋ฐฉ์ด ๋ฐฉ๋ฒ์ด๋ค. ์ด ๋
ผ๋ฌธ์์๋ ์ ๋์ ์์ ๋ค์ด ์๋ณธ ๋ฐ์ดํฐ์๋ ๋๋๋ก ๋ค๋ฅธ ์๋ฏธ๋ฅผ ๊ฐ์ง๋ฉฐ, ๋ชจ๋ธ์ ์๋ํ์ง ์์ ํธํฅ์ ์ง์ด ๋ฃ๋๋ค๋ ๊ธฐ์กด์๋ ๊ฐ๊ณผ๋์ด์๋ ์ ๋์ ํ์ต์ ๋ฌธ์ ๋ฅผ ๋ฐํ๋ค. ์ฐ๋ฆฌ๋ ์ด๋ฌํ ์๋ฏธ๋ฅผ ๋ณด์กดํ์ง ์๋, ๊ทธ๋ฆฌ๊ณ ๊ฒฐ๊ณผ์ ์ผ๋ก ์ ๋งค๋ชจํธํ ์ ๋์ ๋ฐ์ดํฐ๊ฐ ๋ชฉํ ๋ชจ๋ธ์ ๊ฐ๊ฑด์ฑ์ ํด์น๋ค๊ณ ๊ฐ์ค์ ์ธ์ ๋ค. ์ฐ๋ฆฌ๋ ์ด๋ฌํ ์ ๋์ ์์ ๋ค์ ์๋ํ์ง ์์ ์๋ฏธ์ ๋ณํ๋ฅผ ์ํํ๊ธฐ ์ํด, ํ์ต ๋จ๊ณ์์ ์ ๋์ ์์ ๋ค์ ์์ฑํ ๋ ๋ชจ๋ ํด๋์ค๋ค์๊ฒ์ ๊ณต์ ๋๋ ํฝ์
์ ๊ต๋ํ๋๋ก ๊ถ์ฅํ๋, ์๋ฏธ ๋ณด์กด ์ ๋์ ํ์ต์ ์ ์ํ๋ค. ์คํ ๊ฒฐ๊ณผ๋ ์๋ฏธ ๋ณด์กด ์ ๋์ ํ์ต์ด ์ ๋์ ๊ฐ๊ฑด์ฑ์ ๊ฐ์ ํ๋ฉฐ, CIFAR-10๊ณผ CIFAR-100๊ณผ STL-10์์ ์ต๊ณ ์ ์ฑ๋ฅ์ ๋ฌ์ฑํจ์ ๋ณด์ธ๋ค.Chapter 1 Introduction 1
Chapter 2 Preliminaries 5
Chapter 3 Related Works 9
Chapter 4 Semantics-Preserving Adversarial Training 11
4.1 Problem of PGD-training . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Semantics-Preserving Adversarial Training . . . . . . . . . . . . . 13
4.3 Combining with Adversarial Training Variants . . . . . . . . . . 14
Chapter 5 Analysis of Adversarial Examples 16
5.1 Visualizing Various Adversarial Examples . . . . . . . . . . . . . 16
5.2 Comparing the Attack Success Rate . . . . . . . . . . . . . . . . 17
Chapter 6 Experiments & Results 22
6.1 Evaluating Robustness . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.1 CIFAR-10 & CIFAR-100 . . . . . . . . . . . . . . . . . . . 22
6.1.2 CIFAR-10 with 500K Unlabeled Data . . . . . . . . . . . 24
6.1.3 STL-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Effect of Label Smoothing Hyperparameterฮฑ. . . . . . . . . . . 25
Chapter 7 Conclusion & Future Work 29Maste
- โฆ