14,878 research outputs found

    A second derivative SQP method: local convergence

    Get PDF
    In [19], we gave global convergence results for a second-derivative SQP method for minimizing the exact ℓ1-merit function for a fixed value of the penalty parameter. To establish this result, we used the properties of the so-called Cauchy step, which was itself computed from the so-called predictor step. In addition, we allowed for the computation of a variety of (optional) SQP steps that were intended to improve the efficiency of the algorithm. \ud \ud Although we established global convergence of the algorithm, we did not discuss certain aspects that are critical when developing software capable of solving general optimization problems. In particular, we must have strategies for updating the penalty parameter and better techniques for defining the positive-definite matrix Bk used in computing the predictor step. In this paper we address both of these issues. We consider two techniques for defining the positive-definite matrix Bk—a simple diagonal approximation and a more sophisticated limited-memory BFGS update. We also analyze a strategy for updating the penalty paramter based on approximately minimizing the ℓ1-penalty function over a sequence of increasing values of the penalty parameter.\ud \ud Algorithms based on exact penalty functions have certain desirable properties. To be practical, however, these algorithms must be guaranteed to avoid the so-called Maratos effect. We show that a nonmonotone varient of our algorithm avoids this phenomenon and, therefore, results in asymptotically superlinear local convergence; this is verified by preliminary numerical results on the Hock and Shittkowski test set

    Global rates of convergence for nonconvex optimization on manifolds

    Full text link
    We consider the minimization of a cost function ff on a manifold MM using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within a tolerance ε\varepsilon. Specifically, we show that, under Lipschitz-type assumptions on the pullbacks of ff to the tangent spaces of MM, both of these algorithms produce points with Riemannian gradient smaller than ε\varepsilon in O(1/ε2)O(1/\varepsilon^2) iterations. Furthermore, RTR returns a point where also the Riemannian Hessian's least eigenvalue is larger than ε-\varepsilon in O(1/ε3)O(1/\varepsilon^3) iterations. There are no assumptions on initialization. The rates match their (sharp) unconstrained counterparts as a function of the accuracy ε\varepsilon (up to constants) and hence are sharp in that sense. These are the first deterministic results for global rates of convergence to approximate first- and second-order Karush-Kuhn-Tucker points on manifolds. They apply in particular for optimization constrained to compact submanifolds of Rn\mathbb{R}^n, under simpler assumptions.Comment: 33 pages, IMA Journal of Numerical Analysis, 201

    Learning Model-Based Sparsity via Projected Gradient Descent

    Full text link
    Several convex formulation methods have been proposed previously for statistical estimation with structured sparsity as the prior. These methods often require a carefully tuned regularization parameter, often a cumbersome or heuristic exercise. Furthermore, the estimate that these methods produce might not belong to the desired sparsity model, albeit accurately approximating the true parameter. Therefore, greedy-type algorithms could often be more desirable in estimating structured-sparse parameters. So far, these greedy methods have mostly focused on linear statistical models. In this paper we study the projected gradient descent with non-convex structured-sparse parameter model as the constraint set. Should the cost function have a Stable Model-Restricted Hessian the algorithm produces an approximation for the desired minimizer. As an example we elaborate on application of the main results to estimation in Generalized Linear Model

    Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models

    Full text link
    Nonnegative matrix factorization (NMF) is a powerful class of feature extraction techniques that has been successfully applied in many fields, namely in signal and image processing. Current NMF techniques have been limited to a single-objective problem in either its linear or nonlinear kernel-based formulation. In this paper, we propose to revisit the NMF as a multi-objective problem, in particular a bi-objective one, where the objective functions defined in both input and feature spaces are taken into account. By taking the advantage of the sum-weighted method from the literature of multi-objective optimization, the proposed bi-objective NMF determines a set of nondominated, Pareto optimal, solutions instead of a single optimal decomposition. Moreover, the corresponding Pareto front is studied and approximated. Experimental results on unmixing real hyperspectral images confirm the efficiency of the proposed bi-objective NMF compared with the state-of-the-art methods
    corecore