9 research outputs found
A variational derivation of a class of BFGS-like methods
We provide a maximum entropy derivation of a new family of BFGS-like methods.
Similar results are then derived for block BFGS methods. This also yields an
independent proof of a result of Fletcher 1991 and its generalisation to the
block case.Comment: 10 page
Objective acceleration for unconstrained optimization
Acceleration schemes can dramatically improve existing optimization
procedures. In most of the work on these schemes, such as nonlinear Generalized
Minimal Residual (N-GMRES), acceleration is based on minimizing the
norm of some target on subspaces of . There are many numerical
examples that show how accelerating general purpose and domain-specific
optimizers with N-GMRES results in large improvements. We propose a natural
modification to N-GMRES, which significantly improves the performance in a
testing environment originally used to advocate N-GMRES. Our proposed approach,
which we refer to as O-ACCEL (Objective Acceleration), is novel in that it
minimizes an approximation to the \emph{objective function} on subspaces of
. We prove that O-ACCEL reduces to the Full Orthogonalization
Method for linear systems when the objective is quadratic, which differentiates
our proposed approach from existing acceleration methods. Comparisons with
L-BFGS and N-CG indicate the competitiveness of O-ACCEL. As it can be combined
with domain-specific optimizers, it may also be beneficial in areas where
L-BFGS or N-CG are not suitable.Comment: 18 pages, 6 figures, 5 table
Limited Memory BFGS method for Sparse and Large-Scale Nonlinear Optimization
Optimization-based control systems are used in many areas of application, including aerospace engineering, economics, robotics and automotive engineering. This work was motivated by the demand for a large-scale sparse solver for this problem class. The sparsity property of the problem is used for the computational efficiency regarding performance and memory consumption. This includes an efficient storing of the occurring matrices and vectors and an appropriate approximation of the Hessian matrix, which is the main subject of this work. Thus, a so-called the limited memory BFGS method has been developed. The limited memory BFGS method, has been implemented in a software library for solving the nonlinear optimization problems, WORHP. Its solving performance has been tested on different optimal control problems and test sets
Symmetric Rank- Methods
This paper proposes a novel class of block quasi-Newton methods for convex
optimization which we call symmetric rank- (SR-) methods. Each iteration
of SR- incorporates the curvature information with Hessian-vector
products achieved from the greedy or random strategy. We prove SR- methods
have the local superlinear convergence rate of
for minimizing smooth and strongly
self-concordant function, where is the problem dimension and is the
iteration counter. This is the first explicit superlinear convergence rate for
block quasi-Newton methods and it successfully explains why block quasi-Newton
methods converge faster than standard quasi-Newton methods in practice
Sub-Sampled Matrix Approximations
Matrix approximations are widely used to accelerate many numerical algorithms. Current methods sample row (or column) spaces to reduce their computational footprint and approximate a matrix A with an appropriate embedding of the data sampled. This work introduces a novel family of randomized iterative algorithms which use significantly less data per iteration than current methods by sampling input and output spaces simultaneously. The data footprint of the algorithms can be tuned (independent of the underlying matrix dimension) to available hardware. Proof is given for the convergence of the algorithms, which are referred to as sub-sampled, in terms of numerically tested error bounds. A heuristic accelerated scheme is developed and compared to current algorithms on a substantial test-suite of matrices. The sub-sampled algorithms provide a lightweight framework to construct more useful inverse and low rank matrix approximations. Modifying the sub-sampled algorithms gives families of methods which iteratively approximate the inverse of a matrix whose accelerated variant is comparable to current state of the art methods. Inserting a compression step in the algorithms gives low rank approximations having accelerated variants which have fixed computational as well as storage footprints
Recommended from our members
Convex Optimization and Extensions, with a View Toward Large-Scale Problems
Machine learning is a major source of interesting optimization problems of current interest. These problems tend to be challenging because of their enormous scale, which makes it difficult to apply traditional optimization algorithms. We explore three avenues to designing algorithms suited to handling these challenges, with a view toward large-scale ML tasks. The first is to develop better general methods for unconstrained minimization. The second is to tailor methods to the features of modern systems, namely the availability of distributed computing. The third is to use specialized algorithms to exploit specific problem structure.
Chapters 2 and 3 focus on improving quasi-Newton methods, a mainstay of unconstrained optimization. In Chapter 2, we analyze an extension of quasi-Newton methods wherein we use block updates, which add curvature information to the Hessian approximation on a higher-dimensional subspace. This defines a family of methods, Block BFGS, that form a spectrum between the classical BFGS method and Newton's method, in terms of the amount of curvature information used. We show that by adding a correction step, the Block BFGS method inherits the convergence guarantees of BFGS for deterministic problems, most notably a Q-superlinear convergence rate for strongly convex problems. To explore the tradeoff between reduced iterations and greater work per iteration of block methods, we present a set of numerical experiments.
In Chapter 3, we focus on the problem of step size determination. To obviate the need for line searches, and for pre-computing fixed step sizes, we derive an analytic step size, which we call curvature-adaptive, for self-concordant functions. This adaptive step size allows us to generalize the damped Newton method of Nesterov to other iterative methods, including gradient descent and quasi-Newton methods. We provide simple proofs of convergence, including superlinear convergence for adaptive BFGS, allowing us to obtain superlinear convergence without line searches.
In Chapter 4, we move from general algorithms to hardware-influenced algorithms. We consider a form of distributed stochastic gradient descent that we call Leader SGD, which is inspired by the Elastic Averaging SGD method. These methods are intended for distributed settings where communication between machines may be expensive, making it important to set their consensus mechanism. We show that LSGD avoids an issue with spurious stationary points that affects EASGD, and provide a convergence analysis of LSGD. In the stochastic strongly convex setting, LSGD converges at the rate O(1/k) with diminishing step sizes, matching other distributed methods. We also analyze the impact of varying communication delays, stochasticity in the selection of the leader points, and under what conditions LSGD may produce better search directions than the gradient alone.
In Chapter 5, we switch again to focus on algorithms to exploit problem structure. Specifically, we consider problems where variables satisfy multiaffine constraints, which motivates us to apply the Alternating Direction Method of Multipliers (ADMM). Problems that can be formulated with such a structure include representation learning (e.g with dictionaries) and deep learning. We show that ADMM can be applied directly to multiaffine problems. By extending the theory of nonconvex ADMM, we prove that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure
Exploring novel designs of NLP solvers: Architecture and Implementation of WORHP
Mathematical Optimization in general and Nonlinear Programming in particular, are applied by many scientific disciplines, such as the automotive sector, the aerospace industry, or the space agencies. With some established NLP solvers having been available for decades, and with the mathematical community being rather conservative in this respect, many of their programming standards are severely outdated. It is safe to assume that such usability shortcomings impede the wider use of NLP methods; a representative example is the use of static workspaces by legacy FORTRAN codes. This dissertation gives an account of the construction of the European NLP solver WORHP by using and combining software standards and techniques that have not previously been applied to mathematical software to this extent. Examples include automatic code generation, a consistent reverse communication architecture and the elimination of static workspaces. The result is a novel, industrial-grade NLP solver that overcomes many technical weaknesses of established NLP solvers and other mathematical software