25 research outputs found
Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More
Existing techniques for certifying the robustness of models for discrete data
either work only for a small class of models or are general at the expense of
efficiency or tightness. Moreover, they do not account for sparsity in the
input which, as our findings show, is often essential for obtaining non-trivial
guarantees. We propose a model-agnostic certificate based on the randomized
smoothing framework which subsumes earlier work and is tight, efficient, and
sparsity-aware. Its computational complexity does not depend on the number of
discrete categories or the dimension of the input (e.g. the graph size), making
it highly scalable. We show the effectiveness of our approach on a wide variety
of models, datasets, and tasks -- specifically highlighting its use for Graph
Neural Networks. So far, obtaining provable guarantees for GNNs has been
difficult due to the discrete and non-i.i.d. nature of graph data. Our method
can certify any GNN and handles perturbations to both the graph structure and
the node attributes.Comment: Proceedings of the 37th International Conference on Machine Learning
(ICML 2020
Are GATs Out of Balance?
While the expressive power and computational capabilities of graph neural
networks (GNNs) have been theoretically studied, their optimization and
learning dynamics, in general, remain largely unexplored. Our study undertakes
the Graph Attention Network (GAT), a popular GNN architecture in which a node's
neighborhood aggregation is weighted by parameterized attention coefficients.
We derive a conservation law of GAT gradient flow dynamics, which explains why
a high portion of parameters in GATs with standard initialization struggle to
change during training. This effect is amplified in deeper GATs, which perform
significantly worse than their shallow counterparts. To alleviate this problem,
we devise an initialization scheme that balances the GAT network. Our approach
i) allows more effective propagation of gradients and in turn enables
trainability of deeper networks, and ii) attains a considerable speedup in
training and convergence time in comparison to the standard initialization. Our
main theorem serves as a stepping stone to studying the learning dynamics of
positive homogeneous models with attention mechanisms.Comment: 25 pages. To be published in Advances in Neural Information
Processing Systems (NeurIPS), 202
Adversarial Weight Perturbation Improves Generalization in Graph Neural Network
A lot of theoretical and empirical evidence shows that the flatter local
minima tend to improve generalization. Adversarial Weight Perturbation (AWP) is
an emerging technique to efficiently and effectively find such minima. In AWP
we minimize the loss w.r.t. a bounded worst-case perturbation of the model
parameters thereby favoring local minima with a small loss in a neighborhood
around them. The benefits of AWP, and more generally the connections between
flatness and generalization, have been extensively studied for i.i.d. data such
as images. In this paper, we extensively study this phenomenon for graph data.
Along the way, we first derive a generalization bound for non-i.i.d. node
classification tasks. Then we identify a vanishing-gradient issue with all
existing formulations of AWP and we propose a new Weighted Truncated AWP
(WT-AWP) to alleviate this issue. We show that regularizing graph neural
networks with WT-AWP consistently improves both natural and robust
generalization across many different graph learning tasks and models.Comment: AAAI 202
Hierarchical Randomized Smoothing
Real-world data is complex and often consists of objects that can be
decomposed into multiple entities (e.g. images into pixels, graphs into
interconnected nodes). Randomized smoothing is a powerful framework for making
models provably robust against small changes to their inputs - by guaranteeing
robustness of the majority vote when randomly adding noise before
classification. Yet, certifying robustness on such complex data via randomized
smoothing is challenging when adversaries do not arbitrarily perturb entire
objects (e.g. images) but only a subset of their entities (e.g. pixels). As a
solution, we introduce hierarchical randomized smoothing: We partially smooth
objects by adding random noise only on a randomly selected subset of their
entities. By adding noise in a more targeted manner than existing methods we
obtain stronger robustness guarantees while maintaining high accuracy. We
initialize hierarchical smoothing using different noising distributions,
yielding novel robustness certificates for discrete and continuous domains. We
experimentally demonstrate the importance of hierarchical smoothing in image
and node classification, where it yields superior robustness-accuracy
trade-offs. Overall, hierarchical smoothing is an important contribution
towards models that are both - certifiably robust to perturbations and
accurate
Localized Randomized Smoothing for Collective Robustness Certification
Models for image segmentation, node classification and many other tasks map a
single input to multiple labels. By perturbing this single shared input (e.g.
the image) an adversary can manipulate several predictions (e.g. misclassify
several pixels). Collective robustness certification is the task of provably
bounding the number of robust predictions under this threat model. The only
dedicated method that goes beyond certifying each output independently is
limited to strictly local models, where each prediction is associated with a
small receptive field. We propose a more general collective robustness
certificate for all types of models. We further show that this approach is
beneficial for the larger class of softly local models, where each output is
dependent on the entire input but assigns different levels of importance to
different input regions (e.g. based on their proximity in the image). The
certificate is based on our novel localized randomized smoothing approach,
where the random perturbation strength for different input regions is
proportional to their importance for the outputs. Localized smoothing
Pareto-dominates existing certificates on both image segmentation and node
classification tasks, simultaneously offering higher accuracy and stronger
certificates.Comment: Accepted at ICLR 202