14,878 research outputs found
A second derivative SQP method: local convergence
In [19], we gave global convergence results for a second-derivative SQP method for minimizing the exact ℓ1-merit function for a fixed value of the penalty parameter. To establish this result, we used the properties of the so-called Cauchy step, which was itself computed from the so-called predictor step. In addition, we allowed for the computation of a variety of (optional) SQP steps that were intended to improve the efficiency of the algorithm. \ud
\ud
Although we established global convergence of the algorithm, we did not discuss certain aspects that are critical when developing software capable of solving general optimization problems. In particular, we must have strategies for updating the penalty parameter and better techniques for defining the positive-definite matrix Bk used in computing the predictor step. In this paper we address both of these issues. We consider two techniques for defining the positive-definite matrix Bk—a simple diagonal approximation and a more sophisticated limited-memory BFGS update. We also analyze a strategy for updating the penalty paramter based on approximately minimizing the ℓ1-penalty function over a sequence of increasing values of the penalty parameter.\ud
\ud
Algorithms based on exact penalty functions have certain desirable properties. To be practical, however, these algorithms must be guaranteed to avoid the so-called Maratos effect. We show that a nonmonotone varient of our algorithm avoids this phenomenon and, therefore, results in asymptotically superlinear local convergence; this is verified by preliminary numerical results on the Hock and Shittkowski test set
Global rates of convergence for nonconvex optimization on manifolds
We consider the minimization of a cost function on a manifold using
Riemannian gradient descent and Riemannian trust regions (RTR). We focus on
satisfying necessary optimality conditions within a tolerance .
Specifically, we show that, under Lipschitz-type assumptions on the pullbacks
of to the tangent spaces of , both of these algorithms produce points
with Riemannian gradient smaller than in
iterations. Furthermore, RTR returns a point where also the Riemannian
Hessian's least eigenvalue is larger than in
iterations. There are no assumptions on initialization.
The rates match their (sharp) unconstrained counterparts as a function of the
accuracy (up to constants) and hence are sharp in that sense.
These are the first deterministic results for global rates of convergence to
approximate first- and second-order Karush-Kuhn-Tucker points on manifolds.
They apply in particular for optimization constrained to compact submanifolds
of , under simpler assumptions.Comment: 33 pages, IMA Journal of Numerical Analysis, 201
Learning Model-Based Sparsity via Projected Gradient Descent
Several convex formulation methods have been proposed previously for
statistical estimation with structured sparsity as the prior. These methods
often require a carefully tuned regularization parameter, often a cumbersome or
heuristic exercise. Furthermore, the estimate that these methods produce might
not belong to the desired sparsity model, albeit accurately approximating the
true parameter. Therefore, greedy-type algorithms could often be more desirable
in estimating structured-sparse parameters. So far, these greedy methods have
mostly focused on linear statistical models. In this paper we study the
projected gradient descent with non-convex structured-sparse parameter model as
the constraint set. Should the cost function have a Stable Model-Restricted
Hessian the algorithm produces an approximation for the desired minimizer. As
an example we elaborate on application of the main results to estimation in
Generalized Linear Model
Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models
Nonnegative matrix factorization (NMF) is a powerful class of feature
extraction techniques that has been successfully applied in many fields, namely
in signal and image processing. Current NMF techniques have been limited to a
single-objective problem in either its linear or nonlinear kernel-based
formulation. In this paper, we propose to revisit the NMF as a multi-objective
problem, in particular a bi-objective one, where the objective functions
defined in both input and feature spaces are taken into account. By taking the
advantage of the sum-weighted method from the literature of multi-objective
optimization, the proposed bi-objective NMF determines a set of nondominated,
Pareto optimal, solutions instead of a single optimal decomposition. Moreover,
the corresponding Pareto front is studied and approximated. Experimental
results on unmixing real hyperspectral images confirm the efficiency of the
proposed bi-objective NMF compared with the state-of-the-art methods
- …