14 research outputs found

    Limited-memory BFGS Systems with Diagonal Updates

    Get PDF
    In this paper, we investigate a formula to solve systems of the form (B + {\sigma}I)x = y, where B is a limited-memory BFGS quasi-Newton matrix and {\sigma} is a positive constant. These types of systems arise naturally in large-scale optimization such as trust-region methods as well as doubly-augmented Lagrangian methods. We show that provided a simple condition holds on B_0 and \sigma, the system (B + \sigma I)x = y can be solved via a recursion formula that requies only vector inner products. This formula has complexity M^2n, where M is the number of L-BFGS updates and n >> M is the dimension of x

    Relative effectiveness of the trust-region algorithm with precise secund order derivatives

    Get PDF
    Trust-region methods with precise Hessian matrix have some drawbacks: time consuming calculation of the elements of the second order derivative matrix, and the generally non-definite Hessian matrix causes numerical and methodical troubles. Their applicability depends on how well their substitute, for example the Levenberg-Marguardt-method performs. The Levenberg- Marguardt-method often performs well in least-sguares prob- lems. This procedure dynamically mixes the steepest-descent and the Gauss-Newton-methods. Generally one hopes that the more analytical properties of the problems cost function utilized in an optimization procedure, the faster, the more effective search method can be constructed. It is definitely the case when we use first derivatives together with function values (instead of just func- tion values). In the case of second derivatíve of the cost function the situation is not so clear. In lot of cases even if second order model is used within the search procedure the Hessian matrix is just approximated, and it is not calculated precisely even if it would be possible to calculate analytically, because of its tem- poral cost and a big amout of memory needed. in this paper I investigate whether the precise Hessian matrix is worth to be determined, whether one gains more on the increased effective- ness of the search method than looses on the increased tempo- ral costof dealing with the precise Hessian matrix. In this paper it is done by the comparison of the Levenberg-Marguardt-method and a trust-region method using precise Hessian matrix

    On Solving L-SR1 Trust-Region Subproblems

    Full text link
    In this article, we consider solvers for large-scale trust-region subproblems when the quadratic model is defined by a limited-memory symmetric rank-one (L-SR1) quasi-Newton matrix. We propose a solver that exploits the compact representation of L-SR1 matrices. Our approach makes use of both an orthonormal basis for the eigenspace of the L-SR1 matrix and the Sherman-Morrison-Woodbury formula to compute global solutions to trust-region subproblems. To compute the optimal Lagrange multiplier for the trust-region constraint, we use Newton's method with a judicious initial guess that does not require safeguarding. A crucial property of this solver is that it is able to compute high-accuracy solutions even in the so-called hard case. Additionally, the optimal solution is determined directly by formula, not iteratively. Numerical experiments demonstrate the effectiveness of this solver.Comment: 2015-0

    On solving trust-region and other regularised subproblems in optimization

    Get PDF
    The solution of trust-region and regularisation subproblems which arise in unconstrained optimization is considered. Building on the pioneering work of Gay, Mor´e and Sorensen, methods which obtain the solution of a sequence of parametrized linear systems by factorization are used. Enhancements using high-order polynomial approximation and inverse iteration ensure that the resulting method is both globally and asymptotically at least superlinearly convergent in all cases, including in the notorious hard case. Numerical experiments validate the effectiveness of our approach. The resulting software is available as packages TRS and RQS as part of the GALAHAD optimization library, and is especially designed for large-scale problems

    Updating the regularization parameter in the adaptive cubic regularization algorithm

    Get PDF
    The adaptive cubic regularization method (Cartis et al. in Math. Program. Ser. A 127(2):245–295, 2011; Math. Program. Ser. A. 130(2):295–319, 2011) has been recently proposed for solving unconstrained minimization problems. At each iteration of this method, the objective function is replaced by a cubic approximation which comprises an adaptive regularization parameter whose role is related to the local Lipschitz constant of the objective’s Hessian. We present new updating strategies for this parameter based on interpolation techniques, which improve the overall numerical performance of the algorithm. Numerical experiments on large nonlinear least-squares problems are provided

    Optimization Algorithms for Machine Learning Designed for Parallel and Distributed Environments

    Get PDF
    This thesis proposes several optimization methods that utilize parallel algorithms for large-scale machine learning problems. The overall theme is network-based machine learning algorithms; in particular, we consider two machine learning models: graphical models and neural networks. Graphical models are methods categorized under unsupervised machine learning, aiming at recovering conditional dependencies among random variables from observed samples of a multivariable distribution. Neural networks, on the other hand, are methods that learn an implicit approximation to underlying true nonlinear functions based on sample data and utilize that information to generalize to validation data. The goal of finding the best methods relies on an optimization problem tasked with training such models. Improvements in current methods of solving the optimization problem for graphical models are obtained by parallelization and the use of a new update and a new step-size selection rule in the coordinate descent algorithms designed for large-scale problems. For training deep neural networks, we consider the second-order optimization algorithms within trust-region-like optimization frameworks. Deep networks are represented using large-scale vectors of weights and are trained based on very large datasets. Hence, obtaining second-order information is very expensive for these networks. In this thesis, we undertake an extensive exploration of algorithms that use a small number of curvature evaluations and are hence faster than other existing methods