758 research outputs found
Variational Inference with Latent Space Quantization for Adversarial Resilience
Despite their tremendous success in modelling high-dimensional data
manifolds, deep neural networks suffer from the threat of adversarial attacks -
Existence of perceptually valid input-like samples obtained through careful
perturbation that lead to degradation in the performance of the underlying
model. Major concerns with existing defense mechanisms include
non-generalizability across different attacks, models and large inference time.
In this paper, we propose a generalized defense mechanism capitalizing on the
expressive power of regularized latent space based generative models. We design
an adversarial filter, devoid of access to classifier and adversaries, which
makes it usable in tandem with any classifier. The basic idea is to learn a
Lipschitz constrained mapping from the data manifold, incorporating adversarial
perturbations, to a quantized latent space and re-map it to the true data
manifold. Specifically, we simultaneously auto-encode the data manifold and its
perturbations implicitly through the perturbations of the regularized and
quantized generative latent space, realized using variational inference. We
demonstrate the efficacy of the proposed formulation in providing resilience
against multiple attack types (black and white box) and methods, while being
almost real-time. Our experiments show that the proposed method surpasses the
state-of-the-art techniques in several cases
Adversarial Attack Type I: Cheat Classifiers by Significant Changes
Despite the great success of deep neural networks, the adversarial attack can
cheat some well-trained classifiers by small permutations. In this paper, we
propose another type of adversarial attack that can cheat classifiers by
significant changes. For example, we can significantly change a face but
well-trained neural networks still recognize the adversarial and the original
example as the same person. Statistically, the existing adversarial attack
increases Type II error and the proposed one aims at Type I error, which are
hence named as Type II and Type I adversarial attack, respectively. The two
types of attack are equally important but are essentially different, which are
intuitively explained and numerically evaluated. To implement the proposed
attack, a supervised variation autoencoder is designed and then the classifier
is attacked by updating the latent variables using gradient information.
{Besides, with pre-trained generative models, Type I attack on latent spaces is
investigated as well.} Experimental results show that our method is practical
and effective to generate Type I adversarial examples on large-scale image
datasets. Most of these generated examples can pass detectors designed for
defending Type II attack and the strengthening strategy is only efficient with
a specific type attack, both implying that the underlying reasons for Type I
and Type II attack are different
On The Utility of Conditional Generation Based Mutual Information for Characterizing Adversarial Subspaces
Recent studies have found that deep learning systems are vulnerable to
adversarial examples; e.g., visually unrecognizable adversarial images can
easily be crafted to result in misclassification. The robustness of neural
networks has been studied extensively in the context of adversary detection,
which compares a metric that exhibits strong discriminate power between natural
and adversarial examples. In this paper, we propose to characterize the
adversarial subspaces through the lens of mutual information (MI) approximated
by conditional generation methods. We use MI as an information-theoretic metric
to strengthen existing defenses and improve the performance of adversary
detection. Experimental results on MagNet defense demonstrate that our proposed
MI detector can strengthen its robustness against powerful adversarial attacks.Comment: Accepted to IEEE GlobalSIP 201
Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders
Machine learning models are vulnerable to adversarial examples. Iterative
adversarial training has shown promising results against strong white-box
attacks. However, adversarial training is very expensive, and every time a
model needs to be protected, such expensive training scheme needs to be
performed. In this paper, we propose to apply iterative adversarial training
scheme to an external auto-encoder, which once trained can be used to protect
other models directly. We empirically show that our model outperforms other
purifying-based methods against white-box attacks, and transfers well to
directly protect other base models with different architectures
Generalizable Adversarial Attacks with Latent Variable Perturbation Modelling
Adversarial attacks on deep neural networks traditionally rely on a
constrained optimization paradigm, where an optimization procedure is used to
obtain a single adversarial perturbation for a given input example. In this
work we frame the problem as learning a distribution of adversarial
perturbations, enabling us to generate diverse adversarial distributions given
an unperturbed input. We show that this framework is domain-agnostic in that
the same framework can be employed to attack different input domains with
minimal modification. Across three diverse domains---images, text, and
graphs---our approach generates whitebox attacks with success rates that are
competitive with or superior to existing approaches, with a new
state-of-the-art achieved in the graph domain. Finally, we demonstrate that our
framework can efficiently generate a diverse set of attacks for a single given
input, and is even capable of attacking \textit{unseen} test instances in a
zero-shot manner, exhibiting attack generalization
Dual Motion GAN for Future-Flow Embedded Video Prediction
Future frame prediction in videos is a promising avenue for unsupervised
video representation learning. Video frames are naturally generated by the
inherent pixel flows from preceding frames based on the appearance and motion
dynamics in the video. However, existing methods focus on directly
hallucinating pixel values, resulting in blurry predictions. In this paper, we
develop a dual motion Generative Adversarial Net (GAN) architecture, which
learns to explicitly enforce future-frame predictions to be consistent with the
pixel-wise flows in the video through a dual-learning mechanism. The primal
future-frame prediction and dual future-flow prediction form a closed loop,
generating informative feedback signals to each other for better video
prediction. To make both synthesized future frames and flows indistinguishable
from reality, a dual adversarial training method is proposed to ensure that the
future-flow prediction is able to help infer realistic future-frames, while the
future-frame prediction in turn leads to realistic optical flows. Our dual
motion GAN also handles natural motion uncertainty in different pixel locations
with a new probabilistic motion encoder, which is based on variational
autoencoders. Extensive experiments demonstrate that the proposed dual motion
GAN significantly outperforms state-of-the-art approaches on synthesizing new
video frames and predicting future flows. Our model generalizes well across
diverse visual scenes and shows superiority in unsupervised video
representation learning.Comment: ICCV 17 camera read
Provably robust deep generative models
Recent work in adversarial attacks has developed provably robust methods for
training deep neural network classifiers. However, although they are often
mentioned in the context of robustness, deep generative models themselves have
received relatively little attention in terms of formally analyzing their
robustness properties. In this paper, we propose a method for training provably
robust generative models, specifically a provably robust version of the
variational auto-encoder (VAE). To do so, we first formally define a
(certifiably) robust lower bound on the variational lower bound of the
likelihood, and then show how this bound can be optimized during training to
produce a robust VAE. We evaluate the method on simple examples, and show that
it is able to produce generative models that are substantially more robust to
adversarial attacks (i.e., an adversary trying to perturb inputs so as to
drastically lower their likelihood under the model)
Revisit Lmser and its further development based on convolutional layers
Proposed in 1991, Least Mean Square Error Reconstruction for self-organizing
network, shortly Lmser, was a further development of the traditional
auto-encoder (AE) by folding the architecture with respect to the central
coding layer and thus leading to the features of symmetric weights and neurons,
as well as jointly supervised and unsupervised learning. However, its
advantages were only demonstrated in a one-hidden-layer implementation due to
the lack of computing resources and big data at that time. In this paper, we
revisit Lmser from the perspective of deep learning, develop Lmser network
based on multiple convolutional layers, which is more suitable for
image-related tasks, and confirm several Lmser functions with preliminary
demonstrations on image recognition, reconstruction, association recall, and so
on. Experiments demonstrate that Lmser indeed works as indicated in the
original paper, and it has promising performance in various applications
Performing Co-Membership Attacks Against Deep Generative Models
In this paper we propose a new membership attack method called co-membership
attacks against deep generative models including Variational Autoencoders
(VAEs) and Generative Adversarial Networks (GANs). Specifically, membership
attack aims to check whether a given instance x was used in the training data
or not. A co-membership attack checks whether the given bundle of n instances
were in the training, with the prior knowledge that the bundle was either
entirely used in the training or none at all. Successful membership attacks can
compromise the privacy of training data when the generative model is published.
Our main idea is to cast membership inference of target data x as the
optimization of another neural network (called the attacker network) to search
for the latent encoding to reproduce x. The final reconstruction error is used
directly to conclude whether x was in the training data or not. We conduct
extensive experiments on a variety of datasets and generative models showing
that: our attacker network outperforms prior membership attacks; co-membership
attacks can be substantially more powerful than single attacks; and VAEs are
more susceptible to membership attacks compared to GANs
WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Machine learning models encounter Out-of-Distribution (OoD) errors when the
data seen at test time are generated from a different stochastic generator than
the one used to generate the training data. One proposal to scale OoD detection
to high-dimensional data is to learn a tractable likelihood approximation of
the training distribution, and use it to reject unlikely inputs. However,
likelihood models on natural data are themselves susceptible to OoD errors, and
even assign large likelihoods to samples from other datasets. To mitigate this
problem, we propose Generative Ensembles, which robustify density-based OoD
detection by way of estimating epistemic uncertainty of the likelihood model.
We present a puzzling observation in need of an explanation -- although
likelihood measures cannot account for the typical set of a distribution, and
therefore should not be suitable on their own for OoD detection, WAIC performs
surprisingly well in practice
- …