Search CORE

7 research outputs found

Adversarially Robust Training through Structured Gradient Regularization

Author: Hofmann Thomas
Lucchi Aurelien
Nowozin Sebastian
Roth Kevin
Publication venue
Publication date: 22/05/2018
Field of study

We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles, leveraging the fundamental link between training with noise and regularization. It adds very little computational overhead during learning and is simple to implement generically in standard deep learning frameworks. Our experiments provide strong evidence that structured gradient regularization can act as an effective first line of defense against attacks based on low-level signal corruption

arXiv.org e-Print Archive

Scaleable input gradient regularization for adversarial robustness

Author: Finlay Chris
Oberman Adam M
Publication venue
Publication date: 04/10/2019
Field of study

In this work we revisit gradient regularization for adversarial robustness with some new ingredients. First, we derive new per-image theoretical robustness bounds based on local gradient information. These bounds strongly motivate input gradient regularization. Second, we implement a scaleable version of input gradient regularization which avoids double backpropagation: adversarially robust ImageNet models are trained in 33 hours on four consumer grade GPUs. Finally, we show experimentally and through theoretical certification that input gradient regularization is competitive with adversarial training. Moreover we demonstrate that gradient regularization does not lead to gradient obfuscation or gradient masking

arXiv.org e-Print Archive

Reparameterized Variational Divergence Minimization for Stable Imitation

Author: Agarwal Alekh
Arumugam Dilip
Celikyilmaz Asli
Dey Debadeepta
Dolan Bill
Nouri Elnaz
Publication venue
Publication date: 18/06/2020
Field of study

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success. Inspired by recent investigations of

f

-divergence manipulation for the standard imitation learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We unfortunately find that

f

-divergence minimization through reinforcement learning is susceptible to numerical instabilities. We contribute a reparameterization trick for adversarial imitation learning to alleviate the optimization challenges of the promising

f

-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks

arXiv.org e-Print Archive

Proximal Mapping for Deep Regularization

Author: Li Mao
Ma Yingyi
Zhang Xinhua
Publication venue
Publication date: 14/06/2020
Field of study

Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled. For example, robustness to adversarial perturbations, and correlations between multiple modalities. However, most regularizers are specified in terms of hidden layer outputs, which are not themselves optimization variables. In contrast to prevalent methods that optimize them indirectly through model weights, we propose inserting proximal mapping as a new layer to the deep network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview modeling, both outperforming state-of-the-art methods.Comment: 24 pages, 7 figure

arXiv.org e-Print Archive

Improving performance of deep learning models with axiomatic attribution priors and expected gradients

Author: Erion Gabriel
Janizek Joseph D.
Lee Su-In
Lundberg Scott
Sturmfels Pascal
Publication venue
Publication date: 11/11/2020
Field of study

Recent research has demonstrated that feature attribution methods for deep networks can themselves be incorporated into training; these attribution priors optimize for a model whose attributions have certain desirable properties -- most frequently, that particular features are important or unimportant. These attribution priors are often based on attribution methods that are not guaranteed to satisfy desirable interpretability axioms, such as completeness and implementation invariance. Here, we introduce attribution priors to optimize for higher-level properties of explanations, such as smoothness and sparsity, enabled by a fast new attribution method formulation called expected gradients that satisfies many important interpretability axioms. This improves model performance on many real-world tasks where previous attribution priors fail. Our experiments show that the gains from combining higher-level attribution priors with expected gradients attributions are consistent across image, gene expression, and health care data sets. We believe this work motivates and provides the necessary tools to support the widespread adoption of axiomatic attribution priors in many areas of applied machine learning. The implementations and our results have been made freely available to academic communities.Comment: Updated after submission to Nature Machine Intelligenc

arXiv.org e-Print Archive

Adversarial Examples - A Complete Characterisation of the Phenomenon

Author: Poll Erik
Serban Alexandru Constantin
Visser Joost
Publication venue
Publication date: 17/02/2019
Field of study

We provide a complete characterisation of the phenomenon of adversarial examples - inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the existence of adversarial examples, (2) the security, safety and robustness implications, (3) the methods used to generate and (4) protect against adversarial examples and (5) the ability of adversarial examples to transfer between different machine learning models. We provide ample background information in an effort to make this document self-contained. Therefore, this document can be used as survey, tutorial or as a catalog of attacks and defences using adversarial examples

arXiv.org e-Print Archive

Adversarial Examples on Object Recognition: A Comprehensive Survey

Author: Poll Erik
Serban Alex
Visser Joost
Publication venue
Publication date: 03/09/2020
Field of study

Deep neural networks are at the forefront of machine learning research. However, despite achieving impressive performance on complex tasks, they can be very sensitive: Small perturbations of inputs can be sufficient to induce incorrect behavior. Such perturbations, called adversarial examples, are intentionally designed to test the network's sensitivity to distribution drifts. Given their surprisingly small size, a wide body of literature conjectures on their existence and how this phenomenon can be mitigated. In this article we discuss the impact of adversarial examples on security, safety, and robustness of neural networks. We start by introducing the hypotheses behind their existence, the methods used to construct or protect against them, and the capacity to transfer adversarial examples between different machine learning models. Altogether, the goal is to provide a comprehensive and self-contained survey of this growing field of research.Comment: Published in ACM CSUR. arXiv admin note: text overlap with arXiv:1810.0118

arXiv.org e-Print Archive