9 research outputs found
A Spectral View of Adversarially Robust Features
Given the apparent difficulty of learning models that are robust to
adversarial perturbations, we propose tackling the simpler problem of
developing adversarially robust features. Specifically, given a dataset and
metric of interest, the goal is to return a function (or multiple functions)
that 1) is robust to adversarial perturbations, and 2) has significant
variation across the datapoints. We establish strong connections between
adversarially robust features and a natural spectral property of the geometry
of the dataset and metric of interest. This connection can be leveraged to
provide both robust features, and a lower bound on the robustness of any
function that has significant variance across the dataset. Finally, we provide
empirical evidence that the adversarially robust features given by this
spectral approach can be fruitfully leveraged to learn a robust (and accurate)
model.Comment: To appear at NIPS 201
Diverse Knowledge Distillation (DKD): A Solution for Improving The Robustness of Ensemble Models Against Adversarial Attacks
This paper proposes an ensemble learning model that is resistant to
adversarial attacks. To build resilience, we introduced a training process
where each member learns a radically distinct latent space. Member models are
added one at a time to the ensemble. Simultaneously, the loss function is
regulated by a reverse knowledge distillation, forcing the new member to learn
different features and map to a latent space safely distanced from those of
existing members. We assessed the security and performance of the proposed
solution on image classification tasks using CIFAR10 and MNIST datasets and
showed security and performance improvement compared to the state of the art
defense methods
Robust Encodings: A Framework for Combating Adversarial Typos
Despite excellent performance on many tasks, NLP systems are easily fooled by
small adversarial perturbations of inputs. Existing procedures to defend
against such perturbations are either (i) heuristic in nature and susceptible
to stronger attacks or (ii) provide guaranteed robustness to worst-case
attacks, but are incompatible with state-of-the-art models like BERT. In this
work, we introduce robust encodings (RobEn): a simple framework that confers
guaranteed robustness, without making compromises on model architecture. The
core component of RobEn is an encoding function, which maps sentences to a
smaller, discrete space of encodings. Systems using these encodings as a
bottleneck confer guaranteed robustness with standard training, and the same
encodings can be used across multiple tasks. We identify two desiderata to
construct robust encoding functions: perturbations of a sentence should map to
a small set of encodings (stability), and models using encodings should still
perform well (fidelity). We instantiate RobEn to defend against a large family
of adversarial typos. Across six tasks from GLUE, our instantiation of RobEn
paired with BERT achieves an average robust accuracy of 71.3% against all
adversarial typos in the family considered, while previous work using a
typo-corrector achieves only 35.3% accuracy against a simple greedy attack.Comment: ACL 202
Adversarial Neural Pruning with Latent Vulnerability Suppression
Despite the remarkable performance of deep neural networks on various
computer vision tasks, they are known to be susceptible to adversarial
perturbations, which makes it challenging to deploy them in real-world
safety-critical applications. In this paper, we conjecture that the leading
cause of adversarial vulnerability is the distortion in the latent feature
space, and provide methods to suppress them effectively. Explicitly, we define
\emph{vulnerability} for each latent feature and then propose a new loss for
adversarial learning, \emph{Vulnerability Suppression (VS)} loss, that aims to
minimize the feature-level vulnerability during training. We further propose a
Bayesian framework to prune features with high vulnerability to reduce both
vulnerability and loss on adversarial samples. We validate our
\emph{Adversarial Neural Pruning with Vulnerability Suppression (ANP-VS)}
method on multiple benchmark datasets, on which it not only obtains
state-of-the-art adversarial robustness but also improves the performance on
clean examples, using only a fraction of the parameters used by the full
network. Further qualitative analysis suggests that the improvements come from
the suppression of feature-level vulnerability.Comment: Accepted to ICML 2020. Code available at
https://github.com/divyam3897/ANP_V
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
Training machine learning models that are robust against adversarial inputs
poses seemingly insurmountable challenges. To better understand adversarial
robustness, we consider the underlying problem of learning robust
representations. We develop a notion of representation vulnerability that
captures the maximum change of mutual information between the input and output
distributions, under the worst-case input perturbation. Then, we prove a
theorem that establishes a lower bound on the minimum adversarial risk that can
be achieved for any downstream classifier based on its representation
vulnerability. We propose an unsupervised learning method for obtaining
intrinsically robust representations by maximizing the worst-case mutual
information between the input and output distributions. Experiments on
downstream classification tasks support the robustness of the representations
found using unsupervised learning with our training principle.Comment: ICML 202
A Closer Look at Accuracy vs. Robustness
Current methods for training robust networks lead to a drop in test accuracy,
which has led prior works to posit that a robustness-accuracy tradeoff may be
inevitable in deep learning. We take a closer look at this phenomenon and first
show that real image datasets are actually separated. With this property in
mind, we then prove that robustness and accuracy should both be achievable for
benchmark datasets through locally Lipschitz functions, and hence, there should
be no inherent tradeoff between robustness and accuracy. Through extensive
experiments with robustness methods, we argue that the gap between theory and
practice arises from two limitations of current methods: either they fail to
impose local Lipschitzness or they are insufficiently generalized. We explore
combining dropout with robust training methods and obtain better
generalization. We conclude that achieving robustness and accuracy in practice
may require using methods that impose local Lipschitzness and augmenting them
with deep learning generalization techniques. Code available at
https://github.com/yangarbiter/robust-local-lipschit
Securing Connected & Autonomous Vehicles: Challenges Posed by Adversarial Machine Learning and The Way Forward
Connected and autonomous vehicles (CAVs) will form the backbone of future
next-generation intelligent transportation systems (ITS) providing travel
comfort, road safety, along with a number of value-added services. Such a
transformation---which will be fuelled by concomitant advances in technologies
for machine learning (ML) and wireless communications---will enable a future
vehicular ecosystem that is better featured and more efficient. However, there
are lurking security problems related to the use of ML in such a critical
setting where an incorrect ML decision may not only be a nuisance but can lead
to loss of precious lives. In this paper, we present an in-depth overview of
the various challenges associated with the application of ML in vehicular
networks. In addition, we formulate the ML pipeline of CAVs and present various
potential security issues associated with the adoption of ML methods. In
particular, we focus on the perspective of adversarial ML attacks on CAVs and
outline a solution to defend against adversarial attacks in multiple settings
Extracting robust and accurate features via a robust information bottleneck
We propose a novel strategy for extracting features in supervised learning
that can be used to construct a classifier which is more robust to small
perturbations in the input space. Our method builds upon the idea of the
information bottleneck by introducing an additional penalty term that
encourages the Fisher information of the extracted features to be small, when
parametrized by the inputs. By tuning the regularization parameter, we can
explicitly trade off the opposing desiderata of robustness and accuracy when
constructing a classifier. We derive the optimal solution to the robust
information bottleneck when the inputs and outputs are jointly Gaussian,
proving that the optimally robust features are also jointly Gaussian in that
setting. Furthermore, we propose a method for optimizing a variational bound on
the robust information bottleneck objective in general settings using
stochastic gradient descent, which may be implemented efficiently in neural
networks. Our experimental results for synthetic and real data sets show that
the proposed feature extraction method indeed produces classifiers with
increased robustness to perturbations.Comment: A version of this paper was submitted to IEEE Journal on Selected
Areas in Information Theory (JSAIT
Adversarially Robust Low Dimensional Representations
Many machine learning systems are vulnerable to small perturbations made to
the input either at test time or at training time. This has received much
recent interest on the empirical front due to several applications where
reliability and security are critical, and the emergence of paradigms such as
low precision machine learning. However our theoretical understanding of the
design of adversarially robust algorithms for the above settings is limited.
In this work we focus on Principal Component Analysis (PCA), a ubiquitous
algorithmic primitive in machine learning. We formulate a natural robust
variant of PCA, where the goal is to find a low dimensional subspace to
represent the given data with minimum projection error, and that is in addition
robust to small perturbations measured in norm (say ).
Unlike PCA which is solvable in polynomial time, our formulation is
computationally intractable to optimize as it captures the well-studied sparse
PCA objective as a special case. We show various algorithmic and statistical
results including:
- Polynomial time algorithm that is constant factor competitive in the
worst-case, with respect to the best subspace both in terms of the projection
error and the robustness criterion. We also show that our algorithmic
techniques can be made robust to corruptions in the training data as well, in
addition to yielding representations that are robust at test time.
- We prove that our formulation (and algorithms) also enjoy significant
statistical benefits in terms of sample complexity over standard PCA on account
of a ``regularization effect'', that is formalized using the well-studied
spiked covariance model.
- We illustrate the broad applicability of our algorithmic techniques in
addressing robustness to adversarial perturbations, both at training-time and
test-time.Comment: 68 pages including reference