56 research outputs found
On global minimizers of quadratic functions with cubic regularization
In this paper, we analyze some theoretical properties of the problem of
minimizing a quadratic function with a cubic regularization term, arising in
many methods for unconstrained and constrained optimization that have been
proposed in the last years. First we show that, given any stationary point that
is not a global solution, it is possible to compute, in closed form, a new
point with a smaller objective function value. Then, we prove that a global
minimizer can be obtained by computing a finite number of stationary points.
Finally, we extend these results to the case where stationary conditions are
approximately satisfied, discussing some possible algorithmic applications.Comment: Optimization Letters (2018
Limited Memory Steepest Descent Methods for Nonlinear Optimization
This dissertation concerns the development of limited memory steepest descent (LMSD) methods for solving unconstrained nonlinear optimization problems. In particular, we focus on the class of LMSD methods recently proposed by Fletcher, which he has shown to be competitive with well-known quasi-Newton methods such as L-BFGS. However, in the design of such methods, much work remains to be done. First of all, Fletcher only showed a convergence result for LMSD methods when minimizing strongly convex quadratics, but no convergence rate result. In addition, his method mainly focused on minimizing strongly convex quadratics and general convex objectives, while when it comes to nonconvex objectives, open questions remain about how to effectively deal with nonpositive curvature. Furthermore, Fletcher\u27s method relies on having access to exact gradients, which can be a limitation when computing exact gradients is too expensive. The focus of this dissertation is the design and analysis of algorithms intended to solve these issues.In the first part of the new results in this dissertation, a convergence rate result for an LMSD method is proved. For context, we note that a basic LMSD method is an extension of the Barzilai-Borwein ``two-point stepsize\u27\u27 strategy for steepest descent methods for solving unconstrained optimization problems. It is known that the Barzilai-Borwein strategy yields a method with an R-linear rate of convergence when it is employed to minimize a strongly convex quadratic. Our contribution is to extend this analysis for LMSD, also for strongly convex quadratics. In particular, it is shown that, under reasonable assumptions, the method is R-linearly convergent for any choice of the history length parameter. The results of numerical experiments are also provided to illustrate behaviors of the method that are revealed through the theoretical analysis.The second part proposes an LMSD method for solving unconstrained nonconvex optimization problems. As a steepest descent method, the step computation in each iteration only requires the evaluation of a gradient of the objective function and the calculation of a scalar stepsize. When employed to solve certain convex problems, our method reduces to a variant of LMSD method proposed by Fletcher, which means that, when the history length parameter is set to one, it reduces to a steepest descent method inspired by that proposed by Barzilai and Borwein. However, our method is novel in that we propose new algorithmic features for cases when nonpositive curvature is encountered. That is, our method is particularly suited for solving nonconvex problems. With a nonmonotone line search, we ensure global convergence for a variant of our method. We also illustrate with numerical experiments that our approach often yields superior performance when employed to solve nonconvex problems.In the third part, we propose a limited memory stochastic gradient (LMSG) method for solving optimization problems arising in machine learning. As a start, we focus on problems that are strongly convex. When the dataset is too large such that the computation of full gradients is too expensive, our method computes stepsizes and iterates based on (mini-batch) stochastic gradients. Although in stochastic gradient (SG) methods, a best-tuned fixed stepsize or diminishing stepsize is most widely used, it can be inefficient in practice. Our method adopts a cubic model and always guarantees a positive meaningful stepsize, even when nonpositive curvature is encountered (which can happen when using stochastic gradients, even when the problem is convex). Our approach is based on the LMSD method with cubic regularization proposed in the second part of this dissertation. With a projection of stepsizes, we ensure convergence to a neighborhood of the optimal solution when the interval is fixed and convergence to the optimal solution when the interval is diminishing. We also illustrate with numerical experiments that our approach can outperform an SG method with a fixed stepsize
A Novel Gradient Methodology with Economical Objective Function Evaluations for Data Science Applications
Gradient methods are experiencing a growth in methodological and theoretical
developments owing to the challenges of optimization problems arising in data
science. Focusing on data science applications with expensive objective
function evaluations yet inexpensive gradient function evaluations, gradient
methods that never make objective function evaluations are either being
rejuvenated or actively developed. However, as we show, such gradient methods
are all susceptible to catastrophic divergence under realistic conditions for
data science applications. In light of this, gradient methods which make use of
objective function evaluations become more appealing, yet, as we show, can
result in an exponential increase in objective evaluations between accepted
iterates. As a result, existing gradient methods are poorly suited to the needs
of optimization problems arising from data science. In this work, we address
this gap by developing a generic methodology that economically uses objective
function evaluations in a problem-driven manner to prevent catastrophic
divergence and avoid an explosion in objective evaluations between accepted
iterates. Our methodology allows for specific procedures that can make use of
specific step size selection methodologies or search direction strategies, and
we develop a novel step size selection methodology that is well-suited to data
science applications. We show that a procedure resulting from our methodology
is highly competitive with standard optimization methods on CUTEst test
problems. We then show a procedure resulting from our methodology is highly
favorable relative to standard optimization methods on optimization problems
arising in our target data science applications. Thus, we provide a novel
gradient methodology that is better suited to optimization problems arising in
data science.Comment: 52 pages, 14 figures, 7 tables, 14 algorithm
- …