Search CORE

884 research outputs found

Towards Debugging and Improving Adversarial Robustness Evaluations

Author: PINTOR MAURA
Publication venue: Università degli Studi di Cagliari
Publication date: 18/02/2022
Field of study

Despite exhibiting unprecedented success in many application domains, machine‐learning models have been shown to be vulnerable to adversarial examples, i.e., maliciously perturbed inputs that are able to subvert their predictions at test time. Rigorous testing against such perturbations requires enumerating all possible outputs for all possible inputs, and despite impressive results in this field, these methods remain still difficult to scale to modern deep learning systems. For these reasons, empirical methods are often used. These adversarial perturbations are optimized via gradient descent, minimizing a loss function that aims to increase the probability of misleading the model’s predictions. To understand the sensitivity of the model to such attacks, and to counter the effects, machine-learning model designers craft worst-case adversarial perturbations and test them against the model they are evaluating. However, many of the proposed defenses have been shown to provide a false sense of security due to failures of the attacks, rather than actual improvements in the machine‐learning models’ robustness. They have been broken indeed under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic and automated manner. To this end, we tackle three different challenges: (1) we investigate how adversarial robustness evaluations can be performed efficiently, by proposing a novel attack that can be used to find minimum-norm adversarial perturbations; (2) we propose a framework for debugging adversarial robustness evaluations, by defining metrics that reveal faulty evaluations as well as mitigations to patch the detected problems; and (3) we show how to employ a surrogate model for improving the success of transfer-based attacks, that are useful when gradient-based attacks are failing due to problems in the gradient information. To improve the quality of robustness evaluations, we propose a novel attack, referred to as Fast Minimum‐Norm (FMN) attack, which competes with state‐of‐the‐art attacks in terms of quality of the solution while outperforming them in terms of computational complexity and robustness to sub‐optimal configurations of the attack hyperparameters. These are all desirable characteristics of attacks used in robustness evaluations, as the aforementioned problems often arise from the use of sub‐optimal attack hyperparameters, including, e.g., the number of attack iterations, the step size, and the use of an inappropriate loss function. The correct refinement of these variables is often neglected, hence we designed a novel framework that helps debug the optimization process of adversarial examples, by means of quantitative indicators that unveil common problems and failures during the attack optimization process, e.g., in the configuration of the hyperparameters. Commonly accepted best practices suggest further validating the target model with alternative strategies, among which is the usage of a surrogate model to craft the adversarial examples to transfer to the model being evaluated is useful to check for gradient obfuscation. However, how to effectively create transferable adversarial examples is not an easy process, as many factors influence the success of this strategy. In the context of this research, we utilize a first-order model to show what are the main underlying phenomena that affect transferability and suggest best practices to create adversarial examples that transfer well to the target models.

Archivio istituzionale della ricerca - Università di Cagliari

Gradient-free activation maximization for identifying effective stimuli

Author: Kreiman Gabriel
Xiao Will
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/05/2019
Field of study

A fundamental question for understanding brain function is what types of stimuli drive neurons to fire. In visual neuroscience, this question has also been posted as characterizing the receptive field of a neuron. The search for effective stimuli has traditionally been based on a combination of insights from previous studies, intuition, and luck. Recently, the same question has emerged in the study of units in convolutional neural networks (ConvNets), and together with this question a family of solutions were developed that are generally referred to as "feature visualization by activation maximization." We sought to bring in tools and techniques developed for studying ConvNets to the study of biological neural networks. However, one key difference that impedes direct translation of tools is that gradients can be obtained from ConvNets using backpropagation, but such gradients are not available from the brain. To circumvent this problem, we developed a method for gradient-free activation maximization by combining a generative neural network with a genetic algorithm. We termed this method XDream (EXtending DeepDream with real-time evolution for activation maximization), and we have shown that this method can reliably create strong stimuli for neurons in the macaque visual cortex (Ponce et al., 2019). In this paper, we describe extensive experiments characterizing the XDream method by using ConvNet units as in silico models of neurons. We show that XDream is applicable across network layers, architectures, and training sets; examine design choices in the algorithm; and provide practical guides for choosing hyperparameters in the algorithm. XDream is an efficient algorithm for uncovering neuronal tuning preferences in black-box networks using a vast and diverse stimulus space.Comment: 16 pages, 8 figures, 3 table

arXiv.org e-Print Archive

Directory of Open Access Journals

의미보존 적대적 학습

Author: 이원석
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (석사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2021. 2. 이상구.Adversarial training is a defense technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data. In this paper, we identify an overlooked problem of adversarial training in that these adversarial examples often have different semantics than the original data, introducing unintended biases into the model. We hypothesize that such non-semantics-preserving (and resultingly ambiguous) adversarial data harm the robustness of the target models. To mitigate such unintended semantic changes of adversarial examples, we propose semantics-preserving adversarial training (SPAT) which encourages perturbation on the pixels that are shared among all classes when generating adversarial examples in the training stage. Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10, CIFAR-100, and STL-10.적대적 학습은 적대적 예제를 학습 데이터에 포함시킴으로써 심층 신경망의 적대적 강건성을 개선하는 방어 방법이다. 이 논문에서는 적대적 예제들이 원본 데이터와는 때때로 다른 의미를 가지며, 모델에 의도하지 않은 편향을 집어 넣는다는 기존에는 간과되어왔던 적대적 학습의 문제를 밝힌다. 우리는 이러한 의미를 보존하지 않는, 그리고 결과적으로 애매모호한 적대적 데이터가 목표 모델의 강건성을 해친다고 가설을 세웠다. 우리는 이러한 적대적 예제들의 의도하지 않은 의미적 변화를 완화하기 위해, 학습 단계에서 적대적 예제들을 생성할 때 모든 클래스들에게서 공유되는 픽셀에 교란하도록 권장하는, 의미 보존 적대적 학습을 제안한다. 실험 결과는 의미 보존 적대적 학습이 적대적 강건성을 개선하며, CIFAR-10과 CIFAR-100과 STL-10에서 최고의 성능을 달성함을 보인다.Chapter 1 Introduction 1 Chapter 2 Preliminaries 5 Chapter 3 Related Works 9 Chapter 4 Semantics-Preserving Adversarial Training 11 4.1 Problem of PGD-training . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Semantics-Preserving Adversarial Training . . . . . . . . . . . . . 13 4.3 Combining with Adversarial Training Variants . . . . . . . . . . 14 Chapter 5 Analysis of Adversarial Examples 16 5.1 Visualizing Various Adversarial Examples . . . . . . . . . . . . . 16 5.2 Comparing the Attack Success Rate . . . . . . . . . . . . . . . . 17 Chapter 6 Experiments & Results 22 6.1 Evaluating Robustness . . . . . . . . . . . . . . . . . . . . . . . . 22 6.1.1 CIFAR-10 & CIFAR-100 . . . . . . . . . . . . . . . . . . . 22 6.1.2 CIFAR-10 with 500K Unlabeled Data . . . . . . . . . . . 24 6.1.3 STL-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.2 Effect of Label Smoothing Hyperparameterα. . . . . . . . . . . 25 Chapter 7 Conclusion & Future Work 29Maste

SNU Open Repository and Archive