104 research outputs found

    Model Consistency of Partly Smooth Regularizers

    Full text link
    This paper studies least-square regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of low-complexity. Indeed, they force solutions of variational problems to belong to a low-dimensional manifold (the so-called model) which is stable under small perturbations of the function. This property is crucial to make the underlying low-complexity model robust to small noise. We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct low-dimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and low-rank regularizations

    The Geometry of Sparse Analysis Regularization

    Get PDF
    Analysis sparsity is a common prior in inverse problem or machine learning including special cases such as Total Variation regularization, Edge Lasso and Fused Lasso. We study the geometry of the solution set (a polyhedron) of the analysis l1-regularization (with l2 data fidelity term) when it is not reduced to a singleton without any assumption of the analysis dictionary nor the degradation operator. In contrast with most theoretical work, we do not focus on giving uniqueness and/or stability results, but rather describe a worst-case scenario where the solution set can be big in terms of dimension. Leveraging a fine analysis of the sub-level set of the regularizer itself, we draw a connection between support of a solution and the minimal face containing it, and in particular prove that extremal points can be recovered thanks to an algebraic test. Moreover, we draw a connection between the sign pattern of a solution and the ambient dimension of the smallest face containing it. Finally, we show that any arbitrary sub-polyhedra of the level set can be seen as a solution set of sparse analysis regularization with explicit parameters

    What functions can Graph Neural Networks compute on random graphs? The role of Positional Encoding

    Full text link
    We aim to deepen the theoretical understanding of Graph Neural Networks (GNNs) on large graphs, with a focus on their expressive power. Existing analyses relate this notion to the graph isomorphism problem, which is mostly relevant for graphs of small sizes, or studied graph classification or regression tasks, while prediction tasks on nodes are far more relevant on large graphs. Recently, several works showed that, on very general random graphs models, GNNs converge to certains functions as the number of nodes grows. In this paper, we provide a more complete and intuitive description of the function space generated by equivariant GNNs for node-tasks, through general notions of convergence that encompass several previous examples. We emphasize the role of input node features, and study the impact of node Positional Encodings (PEs), a recent line of work that has been shown to yield state-of-the-art results in practice. Through the study of several examples of PEs on large random graphs, we extend previously known universality results to significantly more general models. Our theoretical results hint at some normalization tricks, which is shown numerically to have a positive impact on GNN generalization on synthetic and real data. Our proofs contain new concentration inequalities of independent interest

    The Degrees of Freedom of the Group Lasso

    Full text link
    This paper studies the sensitivity to the observations of the block/group Lasso solution to an overdetermined linear regression model. Such a regularization is known to promote sparsity patterns structured as nonoverlapping groups of coefficients. Our main contribution provides a local parameterization of the solution with respect to the observations. As a byproduct, we give an unbiased estimate of the degrees of freedom of the group Lasso. Among other applications of such results, one can choose in a principled and objective way the regularization parameter of the Lasso through model selection criteria

    Robust sparse analysis regularization

    Get PDF
    ABSTRACT This work studies some properties of 1 -analysis regularization for the resolution of linear inverse problems. Analysis regularization minimizes the 1 norm of the correlations between the signal and the atoms in the dictionary. The corresponding variational problem includes several well-known regularizations such as the discrete total variation and the fused lasso. We give sufficient conditions such that analysis regularization is robust to noise. ANALYSIS VERSUS SYNTHESIS Regularization through variational analysis is a popular way to compute an approximation of x 0 ∈ R N from the measurements y ∈ R Q as defined by an inverse problem y = Φx 0 + w where w is some additive noise and Φ is a linear operator, for instance a super-resolution or an inpainting operator. N which is used to synthesize a signal Common examples in signal processing of dictionary include the wavelet transform or a finite-difference operator. Synthesis regularization corresponds to the following minimization problem where Ψ = ΦD, and x = Dα. Properties of synthesis prior had been studied intensively, see for instance Analysis regularization corresponds to the following minimization problem In the noiseless case, w = 0, one uses the constrained optimization which reads min x∈R N ||D * x|| 1 subject to Φx = y. This prior had been less studied than the synthesis prior, see for instance UNION OF SUBSPACES MODEL It is natural to keep track of the support of this correlation vector, as done in the following definition. A signal x such that D * x is sparse lives in a cospace G J of small dimension where G J is defined as follow. Definition 2. Given a dictionary D, and J a subset of {1 · · · P }, the cospace G J is defined as where D J is the subdictionary whose columns are indexed by J. The signal space can thus be decomposed as a union of subspaces of increasing dimensions For the 1-D total variation prior, Θ k is the set of piecewise constant signals with k − 1 steps

    Accelerated Alternating Descent Methods for Dykstra-like problems

    No full text
    International audienceThis paper extends recent results by the first author and T. Pock (ICG, TU Graz, Austria) on the acceleration of alternating minimization techniques for quadratic plus nonsmooth objectives depending on two variables. We discuss here the strongly convex situation, and how ‘fast’ methods can be derived by adapting the overrelaxation strategy of Nesterov for projected gradient descent. We also investigate slightly more general alternating descent methods, where several descent steps in each variable are alternatively performed

    Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs

    Full text link
    We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of degree-normalized means. We extend such results to a very large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based mesage passing or max convolutional message passing on top of (degree-normalized) convolutional message passing. Under mild assumptions, we give non asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, we treat the case where the aggregation is a coordinate-wise maximum separately, at it necessitates a very different proof technique and yields a qualitatively different convergence rate

    A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

    Full text link
    Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires O((n+m)12ε−1)\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1}) gradient computations to achieve ε\varepsilon-stationarity with n+mn+m the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, which is therefore optimal in terms of sample complexity
    • …
    corecore