3,262 research outputs found
Adversarial Robustness Across Representation Spaces
Adversarial robustness corresponds to the susceptibility of deep neural
networks to imperceptible perturbations made at test time. In the context of
image tasks, many algorithms have been proposed to make neural networks robust
to adversarial perturbations made to the input pixels. These perturbations are
typically measured in an norm. However, robustness often holds only
for the specific attack used for training. In this work we extend the above
setting to consider the problem of training of deep neural networks that can be
made simultaneously robust to perturbations applied in multiple natural
representation spaces. For the case of image data, examples include the
standard pixel representation as well as the representation in the discrete
cosine transform~(DCT) basis. We design a theoretically sound algorithm with
formal guarantees for the above problem. Furthermore, our guarantees also hold
when the goal is to require robustness with respect to multiple norm
based attacks. We then derive an efficient practical implementation and
demonstrate the effectiveness of our approach on standard datasets for image
classification
Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models
The surging availability of electronic medical records (EHR) leads to
increased research interests in medical predictive modeling. Recently many deep
learning based predicted models are also developed for EHR data and
demonstrated impressive performance. However, a series of recent studies showed
that these deep models are not safe: they suffer from certain vulnerabilities.
In short, a well-trained deep network can be extremely sensitive to inputs with
negligible changes. These inputs are referred to as adversarial examples. In
the context of medical informatics, such attacks could alter the result of a
high performance deep predictive model by slightly perturbing a patient's
medical records. Such instability not only reflects the weakness of deep
architectures, more importantly, it offers guide on detecting susceptible parts
on the inputs. In this paper, we propose an efficient and effective framework
that learns a time-preferential minimum attack targeting the LSTM model with
EHR inputs, and we leverage this attack strategy to screen medical records of
patients and identify susceptible events and measurements. The efficient
screening procedure can assist decision makers to pay extra attentions to the
locations that can cause severe consequence if not measured correctly. We
conduct extensive empirical studies on a real-world urgent care cohort and
demonstrate the effectiveness of the proposed screening approach
Biologically inspired protection of deep networks from adversarial attacks
Inspired by biophysical principles underlying nonlinear dendritic computation
in neural circuits, we develop a scheme to train deep neural networks to make
them robust to adversarial attacks. Our scheme generates highly nonlinear,
saturated neural networks that achieve state of the art performance on gradient
based adversarial examples on MNIST, despite never being exposed to
adversarially chosen examples during training. Moreover, these networks exhibit
unprecedented robustness to targeted, iterative schemes for generating
adversarial examples, including second-order methods. We further identify
principles governing how these networks achieve their robustness, drawing on
methods from information geometry. We find these networks progressively create
highly flat and compressed internal representations that are sensitive to very
few input dimensions, while still solving the task. Moreover, they employ
highly kurtotic weight distributions, also found in the brain, and we
demonstrate how such kurtosis can protect even linear classifiers from
adversarial attack.Comment: 11 page
Are adversarial examples inevitable?
A wide range of defenses have been proposed to harden neural networks against
adversarial attacks. However, a pattern has emerged in which the majority of
adversarial defenses are quickly broken by new attacks. Given the lack of
success at generating robust defenses, we are led to ask a fundamental
question: Are adversarial attacks inevitable? This paper analyzes adversarial
examples from a theoretical perspective, and identifies fundamental bounds on
the susceptibility of a classifier to adversarial attacks. We show that, for
certain classes of problems, adversarial examples are inescapable. Using
experiments, we explore the implications of theoretical guarantees for
real-world problems and discuss how factors such as dimensionality and image
complexity limit a classifier's robustness against adversarial examples
DeepCorrect: Correcting DNN models against Image Distortions
In recent years, the widespread use of deep neural networks (DNNs) has
facilitated great improvements in performance for computer vision tasks like
image classification and object recognition. In most realistic computer vision
applications, an input image undergoes some form of image distortion such as
blur and additive noise during image acquisition or transmission. Deep networks
trained on pristine images perform poorly when tested on such distortions. In
this paper, we evaluate the effect of image distortions like Gaussian blur and
additive noise on the activations of pre-trained convolutional filters. We
propose a metric to identify the most noise susceptible convolutional filters
and rank them in order of the highest gain in classification accuracy upon
correction. In our proposed approach called DeepCorrect, we apply small stacks
of convolutional layers with residual connections, at the output of these
ranked filters and train them to correct the worst distortion affected filter
activations, whilst leaving the rest of the pre-trained filter outputs in the
network unchanged. Performance results show that applying DeepCorrect models
for common vision tasks like image classification (ImageNet), object
recognition (Caltech-101, Caltech-256) and scene classification (SUN-397),
significantly improves the robustness of DNNs against distorted images and
outperforms other alternative approaches..Comment: Accepted to IEEE Transactions on Image Processing, April 2019. For
associated code, see https://github.com/tsborkar/DeepCorrec
Universal Lipschitz Approximation in Bounded Depth Neural Networks
Adversarial attacks against machine learning models are a rather hefty
obstacle to our increasing reliance on these models. Due to this, provably
robust (certified) machine learning models are a major topic of interest.
Lipschitz continuous models present a promising approach to solving this
problem. By leveraging the expressive power of a variant of neural networks
which maintain low Lipschitz constants, we prove that three layer neural
networks using the FullSort activation function are Universal Lipschitz
function Approximators (ULAs). This both explains experimental results and
paves the way for the creation of better certified models going forward. We
conclude by presenting experimental results that suggest that ULAs are a not
just a novelty, but a competitive approach to providing certified classifiers,
using these results to motivate several potential topics of further research
Controlling Over-generalization and its Effect on Adversarial Examples Generation and Detection
Convolutional Neural Networks (CNNs) significantly improve the
state-of-the-art for many applications, especially in computer vision. However,
CNNs still suffer from a tendency to confidently classify out-distribution
samples from unknown classes into pre-defined known classes. Further, they are
also vulnerable to adversarial examples. We are relating these two issues
through the tendency of CNNs to over-generalize for areas of the input space
not covered well by the training set. We show that a CNN augmented with an
extra output class can act as a simple yet effective end-to-end model for
controlling over-generalization. As an appropriate training set for the extra
class, we introduce two resources that are computationally efficient to obtain:
a representative natural out-distribution set and interpolated in-distribution
samples. To help select a representative natural out-distribution set among
available ones, we propose a simple measurement to assess an out-distribution
set's fitness. We also demonstrate that training such an augmented CNN with
representative out-distribution natural datasets and some interpolated samples
allows it to better handle a wide range of unseen out-distribution samples and
black-box adversarial examples without training it on any adversaries. Finally,
we show that generation of white-box adversarial attacks using our proposed
augmented CNN can become harder, as the attack algorithms have to get around
the rejection regions when generating actual adversaries
Interpretable Explanations of Black Boxes by Meaningful Perturbation
As machine learning algorithms are increasingly applied to high impact yet
high risk tasks, such as medical diagnosis or autonomous driving, it is
critical that researchers can explain how such algorithms arrived at their
predictions. In recent years, a number of image saliency methods have been
developed to summarize where highly complex neural networks "look" in an image
for evidence for their predictions. However, these techniques are limited by
their heuristic nature and architectural constraints. In this paper, we make
two main contributions: First, we propose a general framework for learning
different kinds of explanations for any black box algorithm. Second, we
specialise the framework to find the part of an image most responsible for a
classifier decision. Unlike previous works, our method is model-agnostic and
testable because it is grounded in explicit and interpretable image
perturbations.Comment: Final camera-ready paper published at ICCV 2017 (Supplementary
materials:
http://openaccess.thecvf.com/content_ICCV_2017/supplemental/Fong_Interpretable_Explanations_of_ICCV_2017_supplemental.pdf
Convolutional Neural Networks with Transformed Input based on Robust Tensor Network Decomposition
Tensor network decomposition, originated from quantum physics to model
entangled many-particle quantum systems, turns out to be a promising
mathematical technique to efficiently represent and process big data in
parsimonious manner. In this study, we show that tensor networks can
systematically partition structured data, e.g. color images, for distributed
storage and communication in privacy-preserving manner. Leveraging the sea of
big data and metadata privacy, empirical results show that neighbouring
subtensors with implicit information stored in tensor network formats cannot be
identified for data reconstruction. This technique complements the existing
encryption and randomization techniques which store explicit data
representation at one place and highly susceptible to adversarial attacks such
as side-channel attacks and de-anonymization. Furthermore, we propose a theory
for adversarial examples that mislead convolutional neural networks to
misclassification using subspace analysis based on singular value decomposition
(SVD). The theory is extended to analyze higher-order tensors using
tensor-train SVD (TT-SVD); it helps to explain the level of susceptibility of
different datasets to adversarial attacks, the structural similarity of
different adversarial attacks including global and localized attacks, and the
efficacy of different adversarial defenses based on input transformation. An
efficient and adaptive algorithm based on robust TT-SVD is then developed to
detect strong and static adversarial attacks
Neural Networks in Adversarial Setting and Ill-Conditioned Weight Space
Recently, Neural networks have seen a huge surge in its adoption due to their
ability to provide high accuracy on various tasks. On the other hand, the
existence of adversarial examples have raised suspicions regarding the
generalization capabilities of neural networks. In this work, we focus on the
weight matrix learnt by the neural networks and hypothesize that ill
conditioned weight matrix is one of the contributing factors in neural
network's susceptibility towards adversarial examples. For ensuring that the
learnt weight matrix's condition number remains sufficiently low, we suggest
using orthogonal regularizer. We show that this indeed helps in increasing the
adversarial accuracy on MNIST and F-MNIST datasets
- …