Search CORE

9 research outputs found

A variational derivation of a class of BFGS-like methods

Author: Pavon Michele
Publication venue
Publication date: 01/01/2018
Field of study

We provide a maximum entropy derivation of a new family of BFGS-like methods. Similar results are then derived for block BFGS methods. This also yields an independent proof of a result of Fletcher 1991 and its generalisation to the block case.Comment: 10 page

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Objective acceleration for unconstrained optimization

Author: Riseth Asbjørn Nilsen
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Acceleration schemes can dramatically improve existing optimization procedures. In most of the work on these schemes, such as nonlinear Generalized Minimal Residual (N-GMRES), acceleration is based on minimizing the

\ell_2

norm of some target on subspaces of

\mathbb{R}^n

. There are many numerical examples that show how accelerating general purpose and domain-specific optimizers with N-GMRES results in large improvements. We propose a natural modification to N-GMRES, which significantly improves the performance in a testing environment originally used to advocate N-GMRES. Our proposed approach, which we refer to as O-ACCEL (Objective Acceleration), is novel in that it minimizes an approximation to the \emph{objective function} on subspaces of

\mathbb{R}^n

. We prove that O-ACCEL reduces to the Full Orthogonalization Method for linear systems when the objective is quadratic, which differentiates our proposed approach from existing acceleration methods. Comparisons with L-BFGS and N-CG indicate the competitiveness of O-ACCEL. As it can be combined with domain-specific optimizers, it may also be beneficial in areas where L-BFGS or N-CG are not suitable.Comment: 18 pages, 6 figures, 5 table

arXiv.org e-Print Archive

Oxford University Research Archive

Limited Memory BFGS method for Sparse and Large-Scale Nonlinear Optimization

Author: Rauski Sonja
Publication venue
Publication date: 01/01/2014
Field of study

Optimization-based control systems are used in many areas of application, including aerospace engineering, economics, robotics and automotive engineering. This work was motivated by the demand for a large-scale sparse solver for this problem class. The sparsity property of the problem is used for the computational efficiency regarding performance and memory consumption. This includes an efficient storing of the occurring matrices and vectors and an appropriate approximation of the Hessian matrix, which is the main subject of this work. Thus, a so-called the limited memory BFGS method has been developed. The limited memory BFGS method, has been implemented in a software library for solving the nonlinear optimization problems, WORHP. Its solving performance has been tested on different optimal control problems and test sets

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Symmetric Rank- $k$ Methods

Author: Chen Cheng
Liu Chengchang
Luo Luo
Publication venue
Publication date: 28/03/2023
Field of study

This paper proposes a novel class of block quasi-Newton methods for convex optimization which we call symmetric rank-

k

(SR-

k

) methods. Each iteration of SR-

k

incorporates the curvature information with

k

Hessian-vector products achieved from the greedy or random strategy. We prove SR-

k

methods have the local superlinear convergence rate of

\mathcal{O}\big((1-k/d)^{t(t-1)/2}\big)

for minimizing smooth and strongly self-concordant function, where

d

is the problem dimension and

t

is the iteration counter. This is the first explicit superlinear convergence rate for block quasi-Newton methods and it successfully explains why block quasi-Newton methods converge faster than standard quasi-Newton methods in practice

arXiv.org e-Print Archive

Sub-Sampled Matrix Approximations

Author: Azzam Joy
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2020
Field of study

Matrix approximations are widely used to accelerate many numerical algorithms. Current methods sample row (or column) spaces to reduce their computational footprint and approximate a matrix A with an appropriate embedding of the data sampled. This work introduces a novel family of randomized iterative algorithms which use significantly less data per iteration than current methods by sampling input and output spaces simultaneously. The data footprint of the algorithms can be tuned (independent of the underlying matrix dimension) to available hardware. Proof is given for the convergence of the algorithms, which are referred to as sub-sampled, in terms of numerically tested error bounds. A heuristic accelerated scheme is developed and compared to current algorithms on a substantial test-suite of matrices. The sub-sampled algorithms provide a lightweight framework to construct more useful inverse and low rank matrix approximations. Modifying the sub-sampled algorithms gives families of methods which iteratively approximate the inverse of a matrix whose accelerated variant is comparable to current state of the art methods. Inserting a compression step in the algorithms gives low rank approximations having accelerated variants which have fixed computational as well as storage footprints

Michigan Technological University

Recommended from our members

Convex Optimization and Extensions, with a View Toward Large-Scale Problems

Author: Gao Wenbo
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Machine learning is a major source of interesting optimization problems of current interest. These problems tend to be challenging because of their enormous scale, which makes it difficult to apply traditional optimization algorithms. We explore three avenues to designing algorithms suited to handling these challenges, with a view toward large-scale ML tasks. The first is to develop better general methods for unconstrained minimization. The second is to tailor methods to the features of modern systems, namely the availability of distributed computing. The third is to use specialized algorithms to exploit specific problem structure. Chapters 2 and 3 focus on improving quasi-Newton methods, a mainstay of unconstrained optimization. In Chapter 2, we analyze an extension of quasi-Newton methods wherein we use block updates, which add curvature information to the Hessian approximation on a higher-dimensional subspace. This defines a family of methods, Block BFGS, that form a spectrum between the classical BFGS method and Newton's method, in terms of the amount of curvature information used. We show that by adding a correction step, the Block BFGS method inherits the convergence guarantees of BFGS for deterministic problems, most notably a Q-superlinear convergence rate for strongly convex problems. To explore the tradeoff between reduced iterations and greater work per iteration of block methods, we present a set of numerical experiments. In Chapter 3, we focus on the problem of step size determination. To obviate the need for line searches, and for pre-computing fixed step sizes, we derive an analytic step size, which we call curvature-adaptive, for self-concordant functions. This adaptive step size allows us to generalize the damped Newton method of Nesterov to other iterative methods, including gradient descent and quasi-Newton methods. We provide simple proofs of convergence, including superlinear convergence for adaptive BFGS, allowing us to obtain superlinear convergence without line searches. In Chapter 4, we move from general algorithms to hardware-influenced algorithms. We consider a form of distributed stochastic gradient descent that we call Leader SGD, which is inspired by the Elastic Averaging SGD method. These methods are intended for distributed settings where communication between machines may be expensive, making it important to set their consensus mechanism. We show that LSGD avoids an issue with spurious stationary points that affects EASGD, and provide a convergence analysis of LSGD. In the stochastic strongly convex setting, LSGD converges at the rate O(1/k) with diminishing step sizes, matching other distributed methods. We also analyze the impact of varying communication delays, stochasticity in the selection of the leader points, and under what conditions LSGD may produce better search directions than the gradient alone. In Chapter 5, we switch again to focus on algorithms to exploit problem structure. Specifically, we consider problems where variables satisfy multiaffine constraints, which motivates us to apply the Alternating Direction Method of Multipliers (ADMM). Problems that can be formulated with such a structure include representation learning (e.g with dictionaries) and deep learning. We show that ADMM can be applied directly to multiaffine problems. By extending the theory of nonconvex ADMM, we prove that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure

Columbia University Academic Commons

Exploring novel designs of NLP solvers: Architecture and Implementation of WORHP

Author: Wassel Dennis
Publication venue
Publication date: 01/01/2013
Field of study

Mathematical Optimization in general and Nonlinear Programming in particular, are applied by many scientific disciplines, such as the automotive sector, the aerospace industry, or the space agencies. With some established NLP solvers having been available for decades, and with the mathematical community being rather conservative in this respect, many of their programming standards are severely outdated. It is safe to assume that such usability shortcomings impede the wider use of NLP methods; a representative example is the use of static workspaces by legacy FORTRAN codes. This dissertation gives an account of the construction of the European NLP solver WORHP by using and combining software standards and techniques that have not previously been applied to mathematical software to this extent. Examples include automatic code generation, a consistent reverse communication architecture and the elimination of static workspaces. The result is a novel, industrial-grade NLP solver that overcomes many technical weaknesses of established NLP solvers and other mathematical software

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen