19 research outputs found
Expressive Power of Invariant and Equivariant Graph Neural Networks
Various classes of Graph Neural Networks (GNN) have been proposed and shown
to be successful in a wide range of applications with graph structured data. In
this paper, we propose a theoretical framework able to compare the expressive
power of these GNN architectures. The current universality theorems only apply
to intractable classes of GNNs. Here, we prove the first approximation
guarantees for practical GNNs, paving the way for a better understanding of
their generalization. Our theoretical results are proved for invariant GNNs
computing a graph embedding (permutation of the nodes of the input graph does
not affect the output) and equivariant GNNs computing an embedding of the nodes
(permutation of the input permutes the output). We show that Folklore Graph
Neural Networks (FGNN), which are tensor based GNNs augmented with matrix
multiplication are the most expressive architectures proposed so far for a
given tensor order. We illustrate our results on the Quadratic Assignment
Problem (a NP-Hard combinatorial problem) by showing that FGNNs are able to
learn how to solve the problem, leading to much better average performances
than existing algorithms (based on spectral, SDP or other GNNs architectures).
On a practical side, we also implement masked tensors to handle batches of
graphs of varying sizes.Comment: Appears in: Proceedings of the 9th International Conference on
Learning Representations, ICLR 2021. 39 page
The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities
In this paper, we analyze the local convergence rate of optimistic mirror
descent methods in stochastic variational inequalities, a class of optimization
problems with important applications to learning theory and machine learning.
Our analysis reveals an intricate relation between the algorithm's rate of
convergence and the local geometry induced by the method's underlying Bregman
function. We quantify this relation by means of the Legendre exponent, a notion
that we introduce to measure the growth rate of the Bregman divergence relative
to the ambient norm near a solution. We show that this exponent determines both
the optimal step-size policy of the algorithm and the optimal rates attained,
explaining in this way the differences observed for some popular Bregman
functions (Euclidean projection, negative entropy, fractional power, etc.).Comment: 31 pages, 3 figures, 1 table; to be presented at the 34th Annual
Conference on Learning Theory (COLT 2021
The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness
We examine the last-iterate convergence rate of Bregman proximal methods -
from mirror descent to mirror-prox and its optimistic variants - as a function
of the local geometry induced by the prox-mapping defining the method. For
generality, we focus on local solutions of constrained, non-monotone
variational inequalities, and we show that the convergence rate of a given
method depends sharply on its associated Legendre exponent, a notion that
measures the growth rate of the underlying Bregman function (Euclidean,
entropic, or other) near a solution. In particular, we show that boundary
solutions exhibit a stark separation of regimes between methods with a zero and
non-zero Legendre exponent: the former converge at a linear rate, while the
latter converge, in general, sublinearly. This dichotomy becomes even more
pronounced in linearly constrained problems where methods with entropic
regularization achieve a linear convergence rate along sharp directions,
compared to convergence in a finite number of steps under Euclidean
regularization.Comment: 31 pages, 2 tables, 2 figure
The last-iterate convergence rate of optimistic mirror descent in stochastic variational inequalities
International audienceIn this paper, we analyze the convergence rate of optimistic mirror descent methods in stochastic variational inequalities, a class of optimization problems with important applications to learning theory and machine learning. Our analysis reveals an intricate relation between the algorithm's rate of convergence and the local geometry induced by the method's underlying Bregman function. We quantify this relation by means of the Legendre exponent, a notion that we introduce to measure the growth rate of the Bregman divergence relative to the ambient norm near a solution. We show that this exponent determines both the optimal step-size policy of the algorithm and the optimal rates attained, explaining in this way the differences observed for some popular Bregman functions (Euclidean projection, negative entropy, fractional power, etc.)
Expressive Power of Invariant and Equivariant Graph Neural Networks
Appears in: Proceedings of the 9th International Conference on Learning Representations, ICLR 2021. 39 pagesInternational audienceVarious classes of Graph Neural Networks (GNN) have been proposed and shown to be successful in a wide range of applications with graph structured data. In this paper, we propose a theoretical framework able to compare the expressive power of these GNN architectures. The current universality theorems only apply to intractable classes of GNNs. Here, we prove the first approximation guarantees for practical GNNs, paving the way for a better understanding of their generalization. Our theoretical results are proved for invariant GNNs computing a graph embedding (permutation of the nodes of the input graph does not affect the output) and equivariant GNNs computing an embedding of the nodes (permutation of the input permutes the output). We show that Folklore Graph Neural Networks (FGNN), which are tensor based GNNs augmented with matrix multiplication are the most expressive architectures proposed so far for a given tensor order. We illustrate our results on the Quadratic Assignment Problem (a NP-Hard combinatorial problem) by showing that FGNNs are able to learn how to solve the problem, leading to much better average performances than existing algorithms (based on spectral, SDP or other GNNs architectures). On a practical side, we also implement masked tensors to handle batches of graphs of varying sizes
Exact Generalization Guarantees for (Regularized) Wasserstein Distributionally Robust Models
49 pages, 2 figures; to be presented at the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023)International audienceWasserstein distributionally robust estimators have emerged as powerful models for prediction and decision-making under uncertainty. These estimators provide attractive generalization guarantees: the robust objective obtained from the training distribution is an exact upper bound on the true risk with high probability. However, existing guarantees either suffer from the curse of dimensionality, are restricted to specific settings, or lead to spurious error terms. In this paper, we show that these generalization guarantees actually hold on general classes of models, do not suffer from the curse of dimensionality, and can even cover distribution shifts at testing. We also prove that these results carry over to the newly-introduced regularized versions of Wasserstein distributionally robust problems