29 research outputs found
New convergence results for the scaled gradient projection method
The aim of this paper is to deepen the convergence analysis of the scaled
gradient projection (SGP) method, proposed by Bonettini et al. in a recent
paper for constrained smooth optimization. The main feature of SGP is the
presence of a variable scaling matrix multiplying the gradient, which may
change at each iteration. In the last few years, an extensive numerical
experimentation showed that SGP equipped with a suitable choice of the scaling
matrix is a very effective tool for solving large scale variational problems
arising in image and signal processing. In spite of the very reliable numerical
results observed, only a weak, though very general, convergence theorem is
provided, establishing that any limit point of the sequence generated by SGP is
stationary. Here, under the only assumption that the objective function is
convex and that a solution exists, we prove that the sequence generated by SGP
converges to a minimum point, if the scaling matrices sequence satisfies a
simple and implementable condition. Moreover, assuming that the gradient of the
objective function is Lipschitz continuous, we are also able to prove the
O(1/k) convergence rate with respect to the objective function values. Finally,
we present the results of a numerical experience on some relevant image
restoration problems, showing that the proposed scaling matrix selection rule
performs well also from the computational point of view
Limited Memory Steepest Descent Methods for Nonlinear Optimization
This dissertation concerns the development of limited memory steepest descent (LMSD) methods for solving unconstrained nonlinear optimization problems. In particular, we focus on the class of LMSD methods recently proposed by Fletcher, which he has shown to be competitive with well-known quasi-Newton methods such as L-BFGS. However, in the design of such methods, much work remains to be done. First of all, Fletcher only showed a convergence result for LMSD methods when minimizing strongly convex quadratics, but no convergence rate result. In addition, his method mainly focused on minimizing strongly convex quadratics and general convex objectives, while when it comes to nonconvex objectives, open questions remain about how to effectively deal with nonpositive curvature. Furthermore, Fletcher\u27s method relies on having access to exact gradients, which can be a limitation when computing exact gradients is too expensive. The focus of this dissertation is the design and analysis of algorithms intended to solve these issues.In the first part of the new results in this dissertation, a convergence rate result for an LMSD method is proved. For context, we note that a basic LMSD method is an extension of the Barzilai-Borwein ``two-point stepsize\u27\u27 strategy for steepest descent methods for solving unconstrained optimization problems. It is known that the Barzilai-Borwein strategy yields a method with an R-linear rate of convergence when it is employed to minimize a strongly convex quadratic. Our contribution is to extend this analysis for LMSD, also for strongly convex quadratics. In particular, it is shown that, under reasonable assumptions, the method is R-linearly convergent for any choice of the history length parameter. The results of numerical experiments are also provided to illustrate behaviors of the method that are revealed through the theoretical analysis.The second part proposes an LMSD method for solving unconstrained nonconvex optimization problems. As a steepest descent method, the step computation in each iteration only requires the evaluation of a gradient of the objective function and the calculation of a scalar stepsize. When employed to solve certain convex problems, our method reduces to a variant of LMSD method proposed by Fletcher, which means that, when the history length parameter is set to one, it reduces to a steepest descent method inspired by that proposed by Barzilai and Borwein. However, our method is novel in that we propose new algorithmic features for cases when nonpositive curvature is encountered. That is, our method is particularly suited for solving nonconvex problems. With a nonmonotone line search, we ensure global convergence for a variant of our method. We also illustrate with numerical experiments that our approach often yields superior performance when employed to solve nonconvex problems.In the third part, we propose a limited memory stochastic gradient (LMSG) method for solving optimization problems arising in machine learning. As a start, we focus on problems that are strongly convex. When the dataset is too large such that the computation of full gradients is too expensive, our method computes stepsizes and iterates based on (mini-batch) stochastic gradients. Although in stochastic gradient (SG) methods, a best-tuned fixed stepsize or diminishing stepsize is most widely used, it can be inefficient in practice. Our method adopts a cubic model and always guarantees a positive meaningful stepsize, even when nonpositive curvature is encountered (which can happen when using stochastic gradients, even when the problem is convex). Our approach is based on the LMSD method with cubic regularization proposed in the second part of this dissertation. With a projection of stepsizes, we ensure convergence to a neighborhood of the optimal solution when the interval is fixed and convergence to the optimal solution when the interval is diminishing. We also illustrate with numerical experiments that our approach can outperform an SG method with a fixed stepsize
A new steplength selection for scaled gradient methods with application to image deblurring
Gradient methods are frequently used in large scale image deblurring problems
since they avoid the onerous computation of the Hessian matrix of the objective
function. Second order information is typically sought by a clever choice of
the steplength parameter defining the descent direction, as in the case of the
well-known Barzilai and Borwein rules. In a recent paper, a strategy for the
steplength selection approximating the inverse of some eigenvalues of the
Hessian matrix has been proposed for gradient methods applied to unconstrained
minimization problems. In the quadratic case, this approach is based on a
Lanczos process applied every m iterations to the matrix of the most recent m
back gradients but the idea can be extended to a general objective function. In
this paper we extend this rule to the case of scaled gradient projection
methods applied to non-negatively constrained minimization problems, and we
test the effectiveness of the proposed strategy in image deblurring problems in
both the presence and the absence of an explicit edge-preserving regularization
term
A two-phase gradient method for quadratic programming problems with a single linear constraint and bounds on the variables
We propose a gradient-based method for quadratic programming problems with a
single linear constraint and bounds on the variables. Inspired by the GPCG
algorithm for bound-constrained convex quadratic programming [J.J. Mor\'e and
G. Toraldo, SIAM J. Optim. 1, 1991], our approach alternates between two phases
until convergence: an identification phase, which performs gradient projection
iterations until either a candidate active set is identified or no reasonable
progress is made, and an unconstrained minimization phase, which reduces the
objective function in a suitable space defined by the identification phase, by
applying either the conjugate gradient method or a recently proposed spectral
gradient method. However, the algorithm differs from GPCG not only because it
deals with a more general class of problems, but mainly for the way it stops
the minimization phase. This is based on a comparison between a measure of
optimality in the reduced space and a measure of bindingness of the variables
that are on the bounds, defined by extending the concept of proportioning,
which was proposed by some authors for box-constrained problems. If the
objective function is bounded, the algorithm converges to a stationary point
thanks to a suitable application of the gradient projection method in the
identification phase. For strictly convex problems, the algorithm converges to
the optimal solution in a finite number of steps even in case of degeneracy.
Extensive numerical experiments show the effectiveness of the proposed
approach.Comment: 30 pages, 17 figure
On the convergence of a linesearch based proximal-gradient method for nonconvex optimization
We consider a variable metric linesearch based proximal gradient method for
the minimization of the sum of a smooth, possibly nonconvex function plus a
convex, possibly nonsmooth term. We prove convergence of this iterative
algorithm to a critical point if the objective function satisfies the
Kurdyka-Lojasiewicz property at each point of its domain, under the assumption
that a limit point exists. The proposed method is applied to a wide collection
of image processing problems and our numerical tests show that our algorithm
results to be flexible, robust and competitive when compared to recently
proposed approaches able to address the optimization problems arising in the
considered applications
Steplength selection in gradient projection methods for box-constrained quadratic programs
The role of the steplength selection strategies in gradient methods has been widely in- vestigated in the last decades. Starting from the work of Barzilai and Borwein (1988), many efficient steplength rules have been designed, that contributed to make the gradient approaches an effective tool for the large-scale optimization problems arising in important real-world applications. Most of these steplength rules have been thought in unconstrained optimization, with the aim of exploiting some second-order information for achieving a fast annihilation of the gradient of the objective function. However, these rules are successfully used also within gradient projection methods for constrained optimization, though, to our knowledge, a detailed analysis of the effects of the constraints on the steplength selections is still not available. In this work we investigate how the presence of the box constraints affects the spectral properties of the Barzilai\u2013Borwein rules in quadratic programming problems. The proposed analysis suggests the introduction of new steplength selection strategies specifically designed for taking account of the active constraints at each iteration. The results of a set of numerical experiments show the effectiveness of the new rules with respect to other state of the art steplength selections and their potential usefulness also in case of box-constrained non-quadratic optimization problems
On Quasi-Newton Forward--Backward Splitting: Proximal Calculus and Convergence
We introduce a framework for quasi-Newton forward--backward splitting
algorithms (proximal quasi-Newton methods) with a metric induced by diagonal
rank- symmetric positive definite matrices. This special type of
metric allows for a highly efficient evaluation of the proximal mapping. The
key to this efficiency is a general proximal calculus in the new metric. By
using duality, formulas are derived that relate the proximal mapping in a
rank- modified metric to the original metric. We also describe efficient
implementations of the proximity calculation for a large class of functions;
the implementations exploit the piece-wise linear nature of the dual problem.
Then, we apply these results to acceleration of composite convex minimization
problems, which leads to elegant quasi-Newton methods for which we prove
convergence. The algorithm is tested on several numerical examples and compared
to a comprehensive list of alternatives in the literature. Our quasi-Newton
splitting algorithm with the prescribed metric compares favorably against
state-of-the-art. The algorithm has extensive applications including signal
processing, sparse recovery, machine learning and classification to name a few.Comment: arXiv admin note: text overlap with arXiv:1206.115
Ritz-like values in steplength selections for stochastic gradient methods
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods for large-scale optimization problems arising in machine learning. In a recent paper, Bollapragada et al. (SIAM J Optim 28(4):3312–3343, 2018) propose to include an adaptive subsampling strategy into a stochastic gradient scheme, with the aim to assure the descent feature in expectation of the stochastic gradient directions. In this approach, theoretical convergence properties are preserved under the assumption that the positive steplength satisfies at any iteration a suitable bound depending on the inverse of the Lipschitz constant of the objective function gradient. In this paper, we propose to tailor for the stochastic gradient scheme the steplength selection adopted in the full-gradient method knows as limited memory steepest descent method. This strategy, based on the Ritz-like values of a suitable matrix, enables to give a local estimate of the inverse of the local Lipschitz parameter, without introducing line search techniques, while the possible increase in the size of the subsample used to compute the stochastic gradient enables to control the variance of this direction. An extensive numerical experimentation highlights that the new rule makes the tuning of the parameters less expensive than the trial procedure for the efficient selection of a constant step in standard and mini-batch stochastic gradient methods
Limited-memory scaled gradient projection methods for real-time image deconvolution in microscopy
Gradient projection methods have given rise to effective tools for image
deconvolution in several relevant areas, such as microscopy, medical imaging
and astronomy. Due to the large scale of the optimization problems arising
in nowadays imaging applications and to the growing request of real-time
reconstructions, an interesting challenge to be faced consists in designing
new acceleration techniques for the gradient schemes, able to preserve the
simplicity and low computational cost of each iteration. In this work we
propose an acceleration strategy for a state of the art scaled gradient
projection method for image deconvolution in microscopy. The acceleration
idea is derived by adapting a step-length selection rule, recently
introduced for limited-memory steepest descent methods in unconstrained
optimization, to the special constrained optimization framework arising in
image reconstruction. We describe how important issues related to the
generalization of the step-length rule to the imaging optimization problem
have been faced and we evaluate the improvements due to the acceleration
strategy by numerical experiments on large-scale image deconvolution problems