104 research outputs found
Model Consistency of Partly Smooth Regularizers
This paper studies least-square regression penalized with partly smooth
convex regularizers. This class of functions is very large and versatile
allowing to promote solutions conforming to some notion of low-complexity.
Indeed, they force solutions of variational problems to belong to a
low-dimensional manifold (the so-called model) which is stable under small
perturbations of the function. This property is crucial to make the underlying
low-complexity model robust to small noise. We show that a generalized
"irrepresentable condition" implies stable model selection under small noise
perturbations in the observations and the design matrix, when the
regularization parameter is tuned proportionally to the noise level. This
condition is shown to be almost a necessary condition. We then show that this
condition implies model consistency of the regularized estimator. That is, with
a probability tending to one as the number of measurements increases, the
regularized estimator belongs to the correct low-dimensional model manifold.
This work unifies and generalizes several previous ones, where model
consistency is known to hold for sparse, group sparse, total variation and
low-rank regularizations
The Geometry of Sparse Analysis Regularization
Analysis sparsity is a common prior in inverse problem or machine learning including special cases such as Total Variation regularization, Edge Lasso and Fused Lasso. We study the geometry of the solution set (a polyhedron) of the analysis l1-regularization (with l2 data fidelity term) when it is not reduced to a singleton without any assumption of the analysis dictionary nor the degradation operator. In contrast with most theoretical work, we do not focus on giving uniqueness and/or stability results, but rather describe a worst-case scenario where the solution set can be big in terms of dimension. Leveraging a fine analysis of the sub-level set of the regularizer itself, we draw a connection between support of a solution and the minimal face containing it, and in particular prove that extremal points can be recovered thanks to an algebraic test. Moreover, we draw a connection between the sign pattern of a solution and the ambient dimension of the smallest face containing it. Finally, we show that any arbitrary sub-polyhedra of the level set can be seen as a solution set of sparse analysis regularization with explicit parameters
What functions can Graph Neural Networks compute on random graphs? The role of Positional Encoding
We aim to deepen the theoretical understanding of Graph Neural Networks
(GNNs) on large graphs, with a focus on their expressive power. Existing
analyses relate this notion to the graph isomorphism problem, which is mostly
relevant for graphs of small sizes, or studied graph classification or
regression tasks, while prediction tasks on nodes are far more relevant on
large graphs. Recently, several works showed that, on very general random
graphs models, GNNs converge to certains functions as the number of nodes
grows. In this paper, we provide a more complete and intuitive description of
the function space generated by equivariant GNNs for node-tasks, through
general notions of convergence that encompass several previous examples. We
emphasize the role of input node features, and study the impact of node
Positional Encodings (PEs), a recent line of work that has been shown to yield
state-of-the-art results in practice. Through the study of several examples of
PEs on large random graphs, we extend previously known universality results to
significantly more general models. Our theoretical results hint at some
normalization tricks, which is shown numerically to have a positive impact on
GNN generalization on synthetic and real data. Our proofs contain new
concentration inequalities of independent interest
The Degrees of Freedom of the Group Lasso
This paper studies the sensitivity to the observations of the block/group
Lasso solution to an overdetermined linear regression model. Such a
regularization is known to promote sparsity patterns structured as
nonoverlapping groups of coefficients. Our main contribution provides a local
parameterization of the solution with respect to the observations. As a
byproduct, we give an unbiased estimate of the degrees of freedom of the group
Lasso. Among other applications of such results, one can choose in a principled
and objective way the regularization parameter of the Lasso through model
selection criteria
Robust sparse analysis regularization
ABSTRACT This work studies some properties of 1 -analysis regularization for the resolution of linear inverse problems. Analysis regularization minimizes the 1 norm of the correlations between the signal and the atoms in the dictionary. The corresponding variational problem includes several well-known regularizations such as the discrete total variation and the fused lasso. We give sufficient conditions such that analysis regularization is robust to noise. ANALYSIS VERSUS SYNTHESIS Regularization through variational analysis is a popular way to compute an approximation of x 0 ∈ R N from the measurements y ∈ R Q as defined by an inverse problem y = Φx 0 + w where w is some additive noise and Φ is a linear operator, for instance a super-resolution or an inpainting operator. N which is used to synthesize a signal Common examples in signal processing of dictionary include the wavelet transform or a finite-difference operator. Synthesis regularization corresponds to the following minimization problem where Ψ = ΦD, and x = Dα. Properties of synthesis prior had been studied intensively, see for instance Analysis regularization corresponds to the following minimization problem In the noiseless case, w = 0, one uses the constrained optimization which reads min x∈R N ||D * x|| 1 subject to Φx = y. This prior had been less studied than the synthesis prior, see for instance UNION OF SUBSPACES MODEL It is natural to keep track of the support of this correlation vector, as done in the following definition. A signal x such that D * x is sparse lives in a cospace G J of small dimension where G J is defined as follow. Definition 2. Given a dictionary D, and J a subset of {1 · · · P }, the cospace G J is defined as where D J is the subdictionary whose columns are indexed by J. The signal space can thus be decomposed as a union of subspaces of increasing dimensions For the 1-D total variation prior, Θ k is the set of piecewise constant signals with k − 1 steps
Accelerated Alternating Descent Methods for Dykstra-like problems
International audienceThis paper extends recent results by the first author and T. Pock (ICG, TU Graz, Austria) on the acceleration of alternating minimization techniques for quadratic plus nonsmooth objectives depending on two variables. We discuss here the strongly convex situation, and how ‘fast’ methods can be derived by adapting the overrelaxation strategy of Nesterov for projected gradient descent. We also investigate slightly more general alternating descent methods, where several descent steps in each variable are alternatively performed
Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs
We study the convergence of message passing graph neural networks on random
graph models to their continuous counterpart as the number of nodes tends to
infinity. Until now, this convergence was only known for architectures with
aggregation functions in the form of degree-normalized means. We extend such
results to a very large class of aggregation functions, that encompasses all
classically used message passing graph neural networks, such as attention-based
mesage passing or max convolutional message passing on top of
(degree-normalized) convolutional message passing. Under mild assumptions, we
give non asymptotic bounds with high probability to quantify this convergence.
Our main result is based on the McDiarmid inequality. Interestingly, we treat
the case where the aggregation is a coordinate-wise maximum separately, at it
necessitates a very different proof technique and yields a qualitatively
different convergence rate
A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization
Bilevel optimization problems, which are problems where two optimization
problems are nested, have more and more applications in machine learning. In
many practical cases, the upper and the lower objectives correspond to
empirical risk minimization problems and therefore have a sum structure. In
this context, we propose a bilevel extension of the celebrated SARAH algorithm.
We demonstrate that the algorithm requires
gradient computations to achieve
-stationarity with the total number of samples, which
improves over all previous bilevel algorithms. Moreover, we provide a lower
bound on the number of oracle calls required to get an approximate stationary
point of the objective function of the bilevel problem. This lower bound is
attained by our algorithm, which is therefore optimal in terms of sample
complexity
- …