7 research outputs found
Adversarially Robust Training through Structured Gradient Regularization
We propose a novel data-dependent structured gradient regularizer to increase
the robustness of neural networks vis-a-vis adversarial perturbations. Our
regularizer can be derived as a controlled approximation from first principles,
leveraging the fundamental link between training with noise and regularization.
It adds very little computational overhead during learning and is simple to
implement generically in standard deep learning frameworks. Our experiments
provide strong evidence that structured gradient regularization can act as an
effective first line of defense against attacks based on low-level signal
corruption
Scaleable input gradient regularization for adversarial robustness
In this work we revisit gradient regularization for adversarial robustness
with some new ingredients. First, we derive new per-image theoretical
robustness bounds based on local gradient information. These bounds strongly
motivate input gradient regularization. Second, we implement a scaleable
version of input gradient regularization which avoids double backpropagation:
adversarially robust ImageNet models are trained in 33 hours on four consumer
grade GPUs. Finally, we show experimentally and through theoretical
certification that input gradient regularization is competitive with
adversarial training. Moreover we demonstrate that gradient regularization does
not lead to gradient obfuscation or gradient masking
Reparameterized Variational Divergence Minimization for Stable Imitation
While recent state-of-the-art results for adversarial imitation-learning
algorithms are encouraging, recent works exploring the imitation learning from
observation (ILO) setting, where trajectories \textit{only} contain expert
observations, have not been met with the same success. Inspired by recent
investigations of -divergence manipulation for the standard imitation
learning setting(Ke et al., 2019; Ghasemipour et al., 2019), we here examine
the extent to which variations in the choice of probabilistic divergence may
yield more performant ILO algorithms. We unfortunately find that -divergence
minimization through reinforcement learning is susceptible to numerical
instabilities. We contribute a reparameterization trick for adversarial
imitation learning to alleviate the optimization challenges of the promising
-divergence minimization framework. Empirically, we demonstrate that our
design choices allow for ILO algorithms that outperform baseline approaches and
more closely match expert performance in low-dimensional continuous-control
tasks
Proximal Mapping for Deep Regularization
Underpinning the success of deep learning is effective regularizations that
allow a variety of priors in data to be modeled. For example, robustness to
adversarial perturbations, and correlations between multiple modalities.
However, most regularizers are specified in terms of hidden layer outputs,
which are not themselves optimization variables. In contrast to prevalent
methods that optimize them indirectly through model weights, we propose
inserting proximal mapping as a new layer to the deep network, which directly
and explicitly produces well regularized hidden layer outputs. The resulting
technique is shown well connected to kernel warping and dropout, and novel
algorithms were developed for robust temporal learning and multiview modeling,
both outperforming state-of-the-art methods.Comment: 24 pages, 7 figure
Improving performance of deep learning models with axiomatic attribution priors and expected gradients
Recent research has demonstrated that feature attribution methods for deep
networks can themselves be incorporated into training; these attribution priors
optimize for a model whose attributions have certain desirable properties --
most frequently, that particular features are important or unimportant. These
attribution priors are often based on attribution methods that are not
guaranteed to satisfy desirable interpretability axioms, such as completeness
and implementation invariance. Here, we introduce attribution priors to
optimize for higher-level properties of explanations, such as smoothness and
sparsity, enabled by a fast new attribution method formulation called expected
gradients that satisfies many important interpretability axioms. This improves
model performance on many real-world tasks where previous attribution priors
fail. Our experiments show that the gains from combining higher-level
attribution priors with expected gradients attributions are consistent across
image, gene expression, and health care data sets. We believe this work
motivates and provides the necessary tools to support the widespread adoption
of axiomatic attribution priors in many areas of applied machine learning. The
implementations and our results have been made freely available to academic
communities.Comment: Updated after submission to Nature Machine Intelligenc
Adversarial Examples - A Complete Characterisation of the Phenomenon
We provide a complete characterisation of the phenomenon of adversarial
examples - inputs intentionally crafted to fool machine learning models. We aim
to cover all the important concerns in this field of study: (1) the conjectures
on the existence of adversarial examples, (2) the security, safety and
robustness implications, (3) the methods used to generate and (4) protect
against adversarial examples and (5) the ability of adversarial examples to
transfer between different machine learning models. We provide ample background
information in an effort to make this document self-contained. Therefore, this
document can be used as survey, tutorial or as a catalog of attacks and
defences using adversarial examples
Adversarial Examples on Object Recognition: A Comprehensive Survey
Deep neural networks are at the forefront of machine learning research.
However, despite achieving impressive performance on complex tasks, they can be
very sensitive: Small perturbations of inputs can be sufficient to induce
incorrect behavior. Such perturbations, called adversarial examples, are
intentionally designed to test the network's sensitivity to distribution
drifts. Given their surprisingly small size, a wide body of literature
conjectures on their existence and how this phenomenon can be mitigated. In
this article we discuss the impact of adversarial examples on security, safety,
and robustness of neural networks. We start by introducing the hypotheses
behind their existence, the methods used to construct or protect against them,
and the capacity to transfer adversarial examples between different machine
learning models. Altogether, the goal is to provide a comprehensive and
self-contained survey of this growing field of research.Comment: Published in ACM CSUR. arXiv admin note: text overlap with
arXiv:1810.0118