174 research outputs found
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
A trust region-type normal map-based semismooth Newton method for nonsmooth nonconvex composite optimization
We propose a novel trust region method for solving a class of nonsmooth and
nonconvex composite-type optimization problems. The approach embeds inexact
semismooth Newton steps for finding zeros of a normal map-based stationarity
measure for the problem in a trust region framework. Based on a new merit
function and acceptance mechanism, global convergence and transition to fast
local q-superlinear convergence are established under standard conditions. In
addition, we verify that the proposed trust region globalization is compatible
with the Kurdyka-{\L}ojasiewicz (KL) inequality yielding finer convergence
results. We further derive new normal map-based representations of the
associated second-order optimality conditions that have direct connections to
the local assumptions required for fast convergence. Finally, we study the
behavior of our algorithm when the Hessian matrix of the smooth part of the
objective function is approximated by BFGS updates. We successfully link the KL
theory, properties of the BFGS approximations, and a Dennis-Mor{\'e}-type
condition to show superlinear convergence of the quasi-Newton version of our
method. Numerical experiments on sparse logistic regression and image
compression illustrate the efficiency of the proposed algorithm.Comment: 56 page
Convergence of Successive Linear Programming Algorithms for Noisy Functions
Gradient-based methods have been highly successful for solving a variety of
both unconstrained and constrained nonlinear optimization problems. In
real-world applications, such as optimal control or machine learning, the
necessary function and derivative information may be corrupted by noise,
however. Sun and Nocedal have recently proposed a remedy for smooth
unconstrained problems by means of a stabilization of the acceptance criterion
for computed iterates, which leads to convergence of the iterates of a
trust-region method to a region of criticality, Sun and Nocedal (2022).
We extend their analysis to the successive linear programming algorithm, Byrd
et al. (2023a,2023b), for unconstrained optimization problems with objectives
that can be characterized as the composition of a polyhedral function with a
smooth function, where the latter and its gradient may be corrupted by noise.
This gives the flexibility to cover, for example, (sub)problems arising image
reconstruction or constrained optimization algorithms.
We provide computational examples that illustrate the findings and point to
possible strategies for practical determination of the stabilization parameter
that balances the size of the critical region with a relaxation of the
acceptance criterion (or descent property) of the algorithm
Recommended from our members
Convex Optimization and Extensions, with a View Toward Large-Scale Problems
Machine learning is a major source of interesting optimization problems of current interest. These problems tend to be challenging because of their enormous scale, which makes it difficult to apply traditional optimization algorithms. We explore three avenues to designing algorithms suited to handling these challenges, with a view toward large-scale ML tasks. The first is to develop better general methods for unconstrained minimization. The second is to tailor methods to the features of modern systems, namely the availability of distributed computing. The third is to use specialized algorithms to exploit specific problem structure.
Chapters 2 and 3 focus on improving quasi-Newton methods, a mainstay of unconstrained optimization. In Chapter 2, we analyze an extension of quasi-Newton methods wherein we use block updates, which add curvature information to the Hessian approximation on a higher-dimensional subspace. This defines a family of methods, Block BFGS, that form a spectrum between the classical BFGS method and Newton's method, in terms of the amount of curvature information used. We show that by adding a correction step, the Block BFGS method inherits the convergence guarantees of BFGS for deterministic problems, most notably a Q-superlinear convergence rate for strongly convex problems. To explore the tradeoff between reduced iterations and greater work per iteration of block methods, we present a set of numerical experiments.
In Chapter 3, we focus on the problem of step size determination. To obviate the need for line searches, and for pre-computing fixed step sizes, we derive an analytic step size, which we call curvature-adaptive, for self-concordant functions. This adaptive step size allows us to generalize the damped Newton method of Nesterov to other iterative methods, including gradient descent and quasi-Newton methods. We provide simple proofs of convergence, including superlinear convergence for adaptive BFGS, allowing us to obtain superlinear convergence without line searches.
In Chapter 4, we move from general algorithms to hardware-influenced algorithms. We consider a form of distributed stochastic gradient descent that we call Leader SGD, which is inspired by the Elastic Averaging SGD method. These methods are intended for distributed settings where communication between machines may be expensive, making it important to set their consensus mechanism. We show that LSGD avoids an issue with spurious stationary points that affects EASGD, and provide a convergence analysis of LSGD. In the stochastic strongly convex setting, LSGD converges at the rate O(1/k) with diminishing step sizes, matching other distributed methods. We also analyze the impact of varying communication delays, stochasticity in the selection of the leader points, and under what conditions LSGD may produce better search directions than the gradient alone.
In Chapter 5, we switch again to focus on algorithms to exploit problem structure. Specifically, we consider problems where variables satisfy multiaffine constraints, which motivates us to apply the Alternating Direction Method of Multipliers (ADMM). Problems that can be formulated with such a structure include representation learning (e.g with dictionaries) and deep learning. We show that ADMM can be applied directly to multiaffine problems. By extending the theory of nonconvex ADMM, we prove that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure
- …