8 research outputs found
Extrapolated proximal subgradient algorithms for nonconvex and nonsmooth fractional programs
In this paper, we consider a broad class of nonsmooth and nonconvex fractional programs, which encompass many important modern optimization problems arising from diverse areas such as the recently proposed scale-invariant sparse signal reconstruction problem in signal processing. We propose a proximal subgradient algorithm with extrapolations for solving this optimization model and show that the iterated sequence generated by the algorithm is bounded and that any one of its limit points is a stationary point of the model problem. The choice of our extrapolation parameter is flexible and includes the popular extrapolation parameter adopted in the restarted fast iterative shrinking-threshold algorithm (FISTA). By providing a unified analysis framework of descent methods, we establish the convergence of the full sequence under the assumption that a suitable merit function satisfies the Kurdyka–Łojasiewicz property. Our algorithm exhibits linear convergence for the scale-invariant sparse signal reconstruction problem and the Rayleigh quotient problem over spherical constraint. When the denominator is the maximum of finitely many continuously differentiable weakly convex functions, we also propose another extrapolated proximal subgradient algorithm with guaranteed convergence to a stronger notion of stationary points of the model problem. Finally, we illustrate the proposed methods by both analytical and simulated numerical examples. Copyright: © 2021 INFORMS
Techniques d'optimisation non lisse avec des applications en automatique et en mécanique des contacts
L'optimisation non lisse est une branche active de programmation non linéaire moderne, où l'objectif et les contraintes sont des fonctions continues mais pas nécessairement différentiables. Les sous-gradients généralisés sont disponibles comme un substitut à l'information dérivée manquante, et sont utilisés dans le cadre des algorithmes de descente pour se rapprocher des solutions optimales locales. Sous des hypothèses réalistes en pratique, nous prouvons des certificats de convergence vers les points optimums locaux ou critiques à partir d'un point de départ arbitraire. Dans cette thèse, nous développons plus particulièrement des techniques d'optimisation non lisse de type faisceaux, où le défi consiste à prouver des certificats de convergence sans hypothèse de convexité. Des résultats satisfaisants sont obtenus pour les deux classes importantes de fonctions non lisses dans des applications, fonctions C1-inférieurement et C1-supérieurement. Nos méthodes sont appliquées à des problèmes de design dans la théorie du système de contrôle et dans la mécanique de contact unilatéral et en particulier, dans les essais mécaniques destructifs pour la délaminage des matériaux composites. Nous montrons comment ces domaines conduisent à des problèmes d'optimisation non lisse typiques, et nous développons des algorithmes de faisceaux appropriés pour traiter ces problèmes avec succèsNonsmooth optimization is an active branch of modern nonlinear programming, where objective and constraints are continuous but not necessarily differentiable functions. Generalized subgradients are available as a substitute for the missing derivative information, and are used within the framework of descent algorithms to approximate local optimal solutions. Under practically realistic hypotheses we prove convergence certificates to local optima or critical points from an arbitrary starting point. In this thesis we develop especially nonsmooth optimization techniques of bundle type, where the challenge is to prove convergence certificates without convexity hypotheses. Satisfactory results are obtained for two important classes of nonsmooth functions in applications, lower- and upper-C1 functions. Our methods are applied to design problems in control system theory and in unilateral contact mechanics and in particular, in destructive mechanical testing for delamination of composite materials. We show how these fields lead to typical nonsmooth optimization problems, and we develop bundle algorithms suited to address these problems successfully
Global Convergence of Model Function Based Bregman Proximal Minimization Algorithms
Lipschitz continuity of the gradient mapping of a continuously differentiable
function plays a crucial role in designing various optimization algorithms.
However, many functions arising in practical applications such as low rank
matrix factorization or deep neural network problems do not have a Lipschitz
continuous gradient. This led to the development of a generalized notion known
as the -smad property, which is based on generalized proximity measures
called Bregman distances. However, the -smad property cannot handle
nonsmooth functions, for example, simple nonsmooth functions like \abs{x^4-1}
and also many practical composite problems are out of scope. We fix this issue
by proposing the MAP property, which generalizes the -smad property and is
also valid for a large class of nonconvex nonsmooth composite problems. Based
on the proposed MAP property, we propose a globally convergent algorithm called
Model BPG, that unifies several existing algorithms. The convergence analysis
is based on a new Lyapunov function. We also numerically illustrate the
superior performance of Model BPG on standard phase retrieval problems, robust
phase retrieval problems, and Poisson linear inverse problems, when compared to
a state of the art optimization method that is valid for generic nonconvex
nonsmooth optimization problems.Comment: 44 pages, 22 figure
Recommended from our members
Geometric numerical integration for optimisation
In this thesis, we study geometric numerical integration for the optimisation of various classes of functionals. Numerical integration and the study of systems of differential equations have received increased attention within the optimisation community in the last decade, as a means for devising new optimisation schemes as well as to improve our understanding of the dynamics of existing schemes. Discrete gradient methods from geometric numerical integration preserve structures of first-order gradient systems, including the dissipative structure of schemes such as gradient flows, and thus yield iterative methods that are unconditionally dissipative, i.e. decrease the objective function value for all time steps.
We look at discrete gradient methods for optimisation in several settings. First, we provide a comprehensive study of discrete gradient methods for optimisation of continuously differentiable functions. In particular, we prove properties such as well-posedness of the discrete gradient update equation, convergence rates, convergence of the iterates, and propose methods for solving the discrete gradient update equation with superior stability and convergence rates. Furthermore, we present results from numerical experiments which support the theory.
Second, motivated by the existence of derivative-free discrete gradients, and seeking to solve nonsmooth optimisation problems and more generally black-box problems, including for parameter optimisation problems, we propose methods based on the Itoh--Abe discrete gradient method for solving nonconvex, nonsmooth optimisation problems with derivative-free methods. In this setting, we prove well-posedness of the method, and convergence guarantees within the nonsmooth, nonconvex Clarke subdifferential framework for locally Lipschitz continuous functions. The analysis is shown to hold in various settings, namely in the unconstrained and constrained setting, including epi-Lipschitzian constraints, and for stochastic and deterministic optimisation methods.
Building on the work of derivative-free discrete gradient methods and the concept of structure preservation in geometric numerical integration, we consider discrete gradient methods applied to other differential systems with dissipative structures. In particular, we study the inverse scale space flow, linked to the well-known Bregman methods, which are central to variational optimisation problems and regularisation methods for inverse problems. In this setting, we propose and implement derivative-free schemes that exploit structures such as sparsity to achieve superior convergence rates in numerical experiments, and prove convergence guarantees for these methods in the nonsmooth, nonconvex setting. Furthermore, these schemes can be seen as generalisations of the Gauss-Seidel method and successive-over-relaxation.
Finally, we return to parameter optimisation problems, namely nonsmooth bilevel optimisation problems, and propose a framework to employ first-order methods for these problems, when the underlying variational optimisation problem admits a nonsmooth structure in the partial smoothness framework. In this setting, we prove piecewise differentiability of the parameter-dependent solution mapping, and study algorithmic differentiation approaches to evaluating the derivatives. Furthermore, we prove that the algorithmic derivatives converge to the implicit derivatives. Thus we demonstrate that, although some parameter tuning problems must inevitably be treated as black-box optimisation problems, for a large number of variational problems one can exploit the structure of nonsmoothness to perform gradient-based bilevel optimisation
Bregman proximal minimization algorithms, analysis and applications
In this thesis, we tackle the optimization of several non-smooth and non-convex objectives that arise in practice. The classical results in context of Proximal Gradient algorithms rely on the so-called Lipschitz continuous gradient property. Such conditions do not hold for many objectives in practice, including the objectives arising in matrix factorization, deep neural networks, phase retrieval, image denoising and many others. Recent development, namely, the L-smad property allows us to deal with such objectives via the so-called Bregman distances, which generalize the Euclidean distance. Based on the L-smad property, Bregman Proximal Gradient (BPG) algorithm is already well-known. In our work, we propose an inertial variant of BPG, namely, CoCaIn BPG which incorporates adaptive inertia based on the function’s local behavior. Moreover, we prove the global convergence of the sequence generated by CoCaIn BPG to a critical point of the function. CoCaIn BPG outperforms BPG with a significant margin, which is attributed to the proposed non-standard double backtracking technique. A major challenge in working with BPG based methods is designing the Bregman distance that is suitable for the objective. In this regard, we propose Bregman distances that are suitable to three applications, matrix factorization, deep matrix factorization and deep neural networks. We start with the matrix factorization setting and propose the relevant Bregman distances, then we tackle the deep matrix factorization and deep neural network settings. In all these settings, we also propose the closed form update steps for BPG based methods, which is crucial for practical application. We also propose the closed form inertia that is suitable for efficient application of CoCaIn BPG. However, until here the setting is restricted to additive composite problems and generic composite problems such as the objectives that arise in robust phase retrieval are out of the scope. In order to tackle generic composite problems, the L-smad property needs to be generalized even further. In this regard, we propose MAP property and based on which we propose Model BPG algorithm. The classical techniques of the convergence analysis based on the function value proved to be restrictive. Thus, we propose a novel Lyapunov function that is suitable for the global convergence analysis. We later unify Model BPG and CoCaIn BPG, to propose Model CoCaIn BPG for which we provide the global convergence results. We supplement all our theoretical results with relevant empirical observations to show the competitive performance of our methods compared to existing state of the art optimization methods