884 research outputs found

    Towards Debugging and Improving Adversarial Robustness Evaluations โ€‹

    Get PDF
    Despite exhibiting unprecedented success in many application domains, machineโ€learning models have been shown to be vulnerable to adversarial examples, i.e., maliciously perturbed inputs that are able to subvert their predictions at test time. Rigorous testing against such perturbations requires enumerating all possible outputs for all possible inputs, and despite impressive results in this field, these methods remain still difficult to scale to modern deep learning systems. For these reasons, empirical methods are often used. These adversarial perturbations are optimized via gradient descent, minimizing a loss function that aims to increase the probability of misleading the modelโ€™s predictions. To understand the sensitivity of the model to such attacks, and to counter the effects, machine-learning model designers craft worst-case adversarial perturbations and test them against the model they are evaluating. However, many of the proposed defenses have been shown to provide a false sense of security due to failures of the attacks, rather than actual improvements in the machineโ€learning modelsโ€™ robustness. They have been broken indeed under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic and automated manner. To this end, we tackle three different challenges: (1) we investigate how adversarial robustness evaluations can be performed efficiently, by proposing a novel attack that can be used to find minimum-norm adversarial perturbations; (2) we propose a framework for debugging adversarial robustness evaluations, by defining metrics that reveal faulty evaluations as well as mitigations to patch the detected problems; and (3) we show how to employ a surrogate model for improving the success of transfer-based attacks, that are useful when gradient-based attacks are failing due to problems in the gradient information. To improve the quality of robustness evaluations, we propose a novel attack, referred to as Fast Minimumโ€Norm (FMN) attack, which competes with stateโ€ofโ€theโ€art attacks in terms of quality of the solution while outperforming them in terms of computational complexity and robustness to subโ€optimal configurations of the attack hyperparameters. These are all desirable characteristics of attacks used in robustness evaluations, as the aforementioned problems often arise from the use of subโ€optimal attack hyperparameters, including, e.g., the number of attack iterations, the step size, and the use of an inappropriate loss function. The correct refinement of these variables is often neglected, hence we designed a novel framework that helps debug the optimization process of adversarial examples, by means of quantitative indicators that unveil common problems and failures during the attack optimization process, e.g., in the configuration of the hyperparameters. Commonly accepted best practices suggest further validating the target model with alternative strategies, among which is the usage of a surrogate model to craft the adversarial examples to transfer to the model being evaluated is useful to check for gradient obfuscation. However, how to effectively create transferable adversarial examples is not an easy process, as many factors influence the success of this strategy. In the context of this research, we utilize a first-order model to show what are the main underlying phenomena that affect transferability and suggest best practices to create adversarial examples that transfer well to the target models.

    Gradient-free activation maximization for identifying effective stimuli

    Full text link
    A fundamental question for understanding brain function is what types of stimuli drive neurons to fire. In visual neuroscience, this question has also been posted as characterizing the receptive field of a neuron. The search for effective stimuli has traditionally been based on a combination of insights from previous studies, intuition, and luck. Recently, the same question has emerged in the study of units in convolutional neural networks (ConvNets), and together with this question a family of solutions were developed that are generally referred to as "feature visualization by activation maximization." We sought to bring in tools and techniques developed for studying ConvNets to the study of biological neural networks. However, one key difference that impedes direct translation of tools is that gradients can be obtained from ConvNets using backpropagation, but such gradients are not available from the brain. To circumvent this problem, we developed a method for gradient-free activation maximization by combining a generative neural network with a genetic algorithm. We termed this method XDream (EXtending DeepDream with real-time evolution for activation maximization), and we have shown that this method can reliably create strong stimuli for neurons in the macaque visual cortex (Ponce et al., 2019). In this paper, we describe extensive experiments characterizing the XDream method by using ConvNet units as in silico models of neurons. We show that XDream is applicable across network layers, architectures, and training sets; examine design choices in the algorithm; and provide practical guides for choosing hyperparameters in the algorithm. XDream is an efficient algorithm for uncovering neuronal tuning preferences in black-box networks using a vast and diverse stimulus space.Comment: 16 pages, 8 figures, 3 table

    ์˜๋ฏธ๋ณด์กด ์ ๋Œ€์  ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ์ด์ƒ๊ตฌ.Adversarial training is a defense technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data. In this paper, we identify an overlooked problem of adversarial training in that these adversarial examples often have different semantics than the original data, introducing unintended biases into the model. We hypothesize that such non-semantics-preserving (and resultingly ambiguous) adversarial data harm the robustness of the target models. To mitigate such unintended semantic changes of adversarial examples, we propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes when generating adversarial examples in the training stage. Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10, CIFAR-100, and STL-10.์ ๋Œ€์  ํ•™์Šต์€ ์ ๋Œ€์  ์˜ˆ์ œ๋ฅผ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ์‹œํ‚ด์œผ๋กœ์จ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์˜ ์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ์–ด ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ ๋Œ€์  ์˜ˆ์ œ๋“ค์ด ์›๋ณธ ๋ฐ์ดํ„ฐ์™€๋Š” ๋•Œ๋•Œ๋กœ ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๋ชจ๋ธ์— ์˜๋„ํ•˜์ง€ ์•Š์€ ํŽธํ–ฅ์„ ์ง‘์–ด ๋„ฃ๋Š”๋‹ค๋Š” ๊ธฐ์กด์—๋Š” ๊ฐ„๊ณผ๋˜์–ด์™”๋˜ ์ ๋Œ€์  ํ•™์Šต์˜ ๋ฌธ์ œ๋ฅผ ๋ฐํžŒ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์˜๋ฏธ๋ฅผ ๋ณด์กดํ•˜์ง€ ์•Š๋Š”, ๊ทธ๋ฆฌ๊ณ  ๊ฒฐ๊ณผ์ ์œผ๋กœ ์• ๋งค๋ชจํ˜ธํ•œ ์ ๋Œ€์  ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชฉํ‘œ ๋ชจ๋ธ์˜ ๊ฐ•๊ฑด์„ฑ์„ ํ•ด์นœ๋‹ค๊ณ  ๊ฐ€์„ค์„ ์„ธ์› ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์ ๋Œ€์  ์˜ˆ์ œ๋“ค์˜ ์˜๋„ํ•˜์ง€ ์•Š์€ ์˜๋ฏธ์  ๋ณ€ํ™”๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด, ํ•™์Šต ๋‹จ๊ณ„์—์„œ ์ ๋Œ€์  ์˜ˆ์ œ๋“ค์„ ์ƒ์„ฑํ•  ๋•Œ ๋ชจ๋“  ํด๋ž˜์Šค๋“ค์—๊ฒŒ์„œ ๊ณต์œ ๋˜๋Š” ํ”ฝ์…€์— ๊ต๋ž€ํ•˜๋„๋ก ๊ถŒ์žฅํ•˜๋Š”, ์˜๋ฏธ ๋ณด์กด ์ ๋Œ€์  ํ•™์Šต์„ ์ œ์•ˆํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์˜๋ฏธ ๋ณด์กด ์ ๋Œ€์  ํ•™์Šต์ด ์ ๋Œ€์  ๊ฐ•๊ฑด์„ฑ์„ ๊ฐœ์„ ํ•˜๋ฉฐ, CIFAR-10๊ณผ CIFAR-100๊ณผ STL-10์—์„œ ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์„ ๋ณด์ธ๋‹ค.Chapter 1 Introduction 1 Chapter 2 Preliminaries 5 Chapter 3 Related Works 9 Chapter 4 Semantics-Preserving Adversarial Training 11 4.1 Problem of PGD-training . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Semantics-Preserving Adversarial Training . . . . . . . . . . . . . 13 4.3 Combining with Adversarial Training Variants . . . . . . . . . . 14 Chapter 5 Analysis of Adversarial Examples 16 5.1 Visualizing Various Adversarial Examples . . . . . . . . . . . . . 16 5.2 Comparing the Attack Success Rate . . . . . . . . . . . . . . . . 17 Chapter 6 Experiments & Results 22 6.1 Evaluating Robustness . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1.1 CIFAR-10 & CIFAR-100 . . . . . . . . . . . . . . . . . . . 22 6.1.2 CIFAR-10 with 500K Unlabeled Data . . . . . . . . . . . 24 6.1.3 STL-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.2 Effect of Label Smoothing Hyperparameterฮฑ. . . . . . . . . . . 25 Chapter 7 Conclusion & Future Work 29Maste
    • โ€ฆ
    corecore