172,054 research outputs found
On the Robustness of Bayesian Neural Networks to Adversarial Attacks
Vulnerability to adversarial attacks is one of the principal hurdles to the
adoption of deep learning in safety-critical applications. Despite significant
efforts, both practical and theoretical, training deep learning models robust
to adversarial attacks is still an open problem. In this paper, we analyse the
geometry of adversarial attacks in the large-data, overparameterized limit for
Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to
gradient-based attacks arises as a result of degeneracy in the data
distribution, i.e., when the data lies on a lower-dimensional submanifold of
the ambient space. As a direct consequence, we demonstrate that in this limit
BNN posteriors are robust to gradient-based adversarial attacks. Crucially, we
prove that the expected gradient of the loss with respect to the BNN posterior
distribution is vanishing, even when each neural network sampled from the
posterior is vulnerable to gradient-based attacks. Experimental results on the
MNIST, Fashion MNIST, and half moons datasets, representing the finite data
regime, with BNNs trained with Hamiltonian Monte Carlo and Variational
Inference, support this line of arguments, showing that BNNs can display both
high accuracy on clean data and robustness to both gradient-based and
gradient-free based adversarial attacks.Comment: arXiv admin note: text overlap with arXiv:2002.0435
RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
In this paper, we propose a class of robust stochastic subgradient methods
for distributed learning from heterogeneous datasets at presence of an unknown
number of Byzantine workers. The Byzantine workers, during the learning
process, may send arbitrary incorrect messages to the master due to data
corruptions, communication failures or malicious attacks, and consequently bias
the learned model. The key to the proposed methods is a regularization term
incorporated with the objective function so as to robustify the learning task
and mitigate the negative effects of Byzantine attacks. The resultant
subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation
methods, justifying our acronym RSA used henceforth. In contrast to most of the
existing algorithms, RSA does not rely on the assumption that the data are
independent and identically distributed (i.i.d.) on the workers, and hence fits
for a wider class of applications. Theoretically, we show that: i) RSA
converges to a near-optimal solution with the learning error dependent on the
number of Byzantine workers; ii) the convergence rate of RSA under Byzantine
attacks is the same as that of the stochastic gradient descent method, which is
free of Byzantine attacks. Numerically, experiments on real dataset corroborate
the competitive performance of RSA and a complexity reduction compared to the
state-of-the-art alternatives.Comment: To appear in AAAI 201
Recommended from our members
Towards More Scalable and Robust Machine Learning
For many data-intensive real-world applications, such as recognizing objects from images, detecting spam emails, and recommending items on retail websites, the most successful current approaches involve learning rich prediction rules from large datasets. There are many challenges in these machine learning tasks. For example, as the size of the datasets and the complexity of these prediction rules increase, there is a significant challenge in designing scalable methods that can effectively exploit the availability of distributed computing units. As another example, in many machine learning applications, there can be data corruptions, communication errors, and even adversarial attacks during training and test. Therefore, to build reliable machine learning models, we also have to tackle the challenge of robustness in machine learning.In this dissertation, we study several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrate recent progress towards the ambitious goal of building more scalable and robust machine learning models. We start with the speedup saturation problem in distributed stochastic gradient descent (SGD) algorithms with large mini-batches. We introduce the notion of gradient diversity, a metric of the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We then move forward to Byzantine distributed learning, a topic that involves both scalability and robustness in distributed learning. In the Byzantine setting that we consider, a fraction of distributed worker machines can have arbitrary or even adversarial behavior. We design statistically and computationally efficient algorithms to defend against Byzantine failures in distributed optimization with convex and non-convex objectives. Lastly, we discuss the adversarial example phenomenon. We provide theoretical analysis of the adversarially robust generalization properties of machine learning models through the lens of Radamacher complexity
- …