234 research outputs found
Exploring the Relationship between Architecture and Adversarially Robust Generalization
Adversarial training has been demonstrated to be one of the most effective
remedies for defending adversarial examples, yet it often suffers from the huge
robustness generalization gap on unseen testing adversaries, deemed as the
adversarially robust generalization problem. Despite the preliminary
understandings devoted to adversarially robust generalization, little is known
from the architectural perspective. To bridge the gap, this paper for the first
time systematically investigated the relationship between adversarially robust
generalization and architectural design. Inparticular, we comprehensively
evaluated 20 most representative adversarially trained architectures on
ImageNette and CIFAR-10 datasets towards multiple `p-norm adversarial attacks.
Based on the extensive experiments, we found that, under aligned settings,
Vision Transformers (e.g., PVT, CoAtNet) often yield better adversarially
robust generalization while CNNs tend to overfit on specific attacks and fail
to generalize on multiple adversaries. To better understand the nature behind
it, we conduct theoretical analysis via the lens of Rademacher complexity. We
revealed the fact that the higher weight sparsity contributes significantly
towards the better adversarially robust generalization of Transformers, which
can be often achieved by the specially-designed attention blocks. We hope our
paper could help to better understand the mechanism for designing robust DNNs.
Our model weights can be found at http://robust.art
Improved Generalization Bounds for Robust Learning
We consider a model of robust learning in an adversarial environment. The
learner gets uncorrupted training data with access to possible corruptions that
may be affected by the adversary during testing. The learner's goal is to build
a robust classifier that would be tested on future adversarial examples. We use
a zero-sum game between the learner and the adversary as our game theoretic
framework. The adversary is limited to possible corruptions for each input.
Our model is closely related to the adversarial examples model of Schmidt et
al. (2018); Madry et al. (2017).
Our main results consist of generalization bounds for the binary and
multi-class classification, as well as the real-valued case (regression). For
the binary classification setting, we both tighten the generalization bound of
Feige, Mansour, and Schapire (2015), and also are able to handle an infinite
hypothesis class . The sample complexity is improved from
to
. Additionally, we
extend the algorithm and generalization bound from the binary to the multiclass
and real-valued cases. Along the way, we obtain results on fat-shattering
dimension and Rademacher complexity of -fold maxima over function classes;
these may be of independent interest.
For binary classification, the algorithm of Feige et al. (2015) uses a regret
minimization algorithm and an ERM oracle as a blackbox; we adapt it for the
multi-class and regression settings. The algorithm provides us with
near-optimal policies for the players on a given training sample.Comment: Appearing at the 30th International Conference on Algorithmic
Learning Theory (ALT 2019
A Kernel Perspective for Regularizing Deep Neural Networks
We propose a new point of view for regularizing deep neural networks by using
the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm
cannot be computed, it admits upper and lower approximations leading to various
practical strategies. Specifically, this perspective (i) provides a common
umbrella for many existing regularization principles, including spectral norm
and gradient penalties, or adversarial training, (ii) leads to new effective
regularization penalties, and (iii) suggests hybrid strategies combining lower
and upper bounds to get better approximations of the RKHS norm. We
experimentally show this approach to be effective when learning on small
datasets, or to obtain adversarially robust models.Comment: ICM
PAC-Bayesian Spectrally-Normalized Bounds for Adversarially Robust Generalization
Deep neural networks (DNNs) are vulnerable to adversarial attacks. It is
found empirically that adversarially robust generalization is crucial in
establishing defense algorithms against adversarial attacks. Therefore, it is
interesting to study the theoretical guarantee of robust generalization. This
paper focuses on norm-based complexity, based on a PAC-Bayes approach
(Neyshabur et al., 2017). The main challenge lies in extending the key
ingredient, which is a weight perturbation bound in standard settings, to the
robust settings. Existing attempts heavily rely on additional strong
assumptions, leading to loose bounds. In this paper, we address this issue and
provide a spectrally-normalized robust generalization bound for DNNs. Compared
to existing bounds, our bound offers two significant advantages: Firstly, it
does not depend on additional assumptions. Secondly, it is considerably
tighter, aligning with the bounds of standard generalization. Therefore, our
result provides a different perspective on understanding robust generalization:
The mismatch terms between standard and robust generalization bounds shown in
previous studies do not contribute to the poor robust generalization. Instead,
these disparities solely due to mathematical issues. Finally, we extend the
main result to adversarial robustness against general non- attacks and
other neural network architectures.Comment: NeurIPS 202
Adversarial Rademacher Complexity of Deep Neural Networks
Deep neural networks are vulnerable to adversarial attacks. Ideally, a robust
model shall perform well on both the perturbed training data and the unseen
perturbed test data. It is found empirically that fitting perturbed training
data is not hard, but generalizing to perturbed test data is quite difficult.
To better understand adversarial generalization, it is of great interest to
study the adversarial Rademacher complexity (ARC) of deep neural networks.
However, how to bound ARC in multi-layers cases is largely unclear due to the
difficulty of analyzing adversarial loss in the definition of ARC. There have
been two types of attempts of ARC. One is to provide the upper bound of ARC in
linear and one-hidden layer cases. However, these approaches seem hard to
extend to multi-layer cases. Another is to modify the adversarial loss and
provide upper bounds of Rademacher complexity on such surrogate loss in
multi-layer cases. However, such variants of Rademacher complexity are not
guaranteed to be bounds for meaningful robust generalization gaps (RGG). In
this paper, we provide a solution to this unsolved problem. Specifically, we
provide the first bound of adversarial Rademacher complexity of deep neural
networks. Our approach is based on covering numbers. We provide a method to
handle the robustify function classes of DNNs such that we can calculate the
covering numbers. Finally, we provide experiments to study the empirical
implication of our bounds and provide an analysis of poor adversarial
generalization
Recommended from our members
Towards More Scalable and Robust Machine Learning
For many data-intensive real-world applications, such as recognizing objects from images, detecting spam emails, and recommending items on retail websites, the most successful current approaches involve learning rich prediction rules from large datasets. There are many challenges in these machine learning tasks. For example, as the size of the datasets and the complexity of these prediction rules increase, there is a significant challenge in designing scalable methods that can effectively exploit the availability of distributed computing units. As another example, in many machine learning applications, there can be data corruptions, communication errors, and even adversarial attacks during training and test. Therefore, to build reliable machine learning models, we also have to tackle the challenge of robustness in machine learning.In this dissertation, we study several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrate recent progress towards the ambitious goal of building more scalable and robust machine learning models. We start with the speedup saturation problem in distributed stochastic gradient descent (SGD) algorithms with large mini-batches. We introduce the notion of gradient diversity, a metric of the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We then move forward to Byzantine distributed learning, a topic that involves both scalability and robustness in distributed learning. In the Byzantine setting that we consider, a fraction of distributed worker machines can have arbitrary or even adversarial behavior. We design statistically and computationally efficient algorithms to defend against Byzantine failures in distributed optimization with convex and non-convex objectives. Lastly, we discuss the adversarial example phenomenon. We provide theoretical analysis of the adversarially robust generalization properties of machine learning models through the lens of Radamacher complexity
- …