55 research outputs found

    A Framework of Constraint Preserving Update Schemes for Optimization on Stiefel Manifold

    Full text link
    This paper considers optimization problems on the Stiefel manifold XTX=IpX^{\mathsf{T}}X=I_p, where X∈Rn×pX\in \mathbb{R}^{n \times p} is the variable and IpI_p is the pp-by-pp identity matrix. A framework of constraint preserving update schemes is proposed by decomposing each feasible point into the range space of XX and the null space of XTX^{\mathsf{T}}. While this general framework can unify many existing schemes, a new update scheme with low complexity cost is also discovered. Then we study a feasible Barzilai-Borwein-like method under the new update scheme. The global convergence of the method is established with an adaptive nonmonotone line search. The numerical tests on the nearest low-rank correlation matrix problem, the Kohn-Sham total energy minimization and a specific problem from statistics demonstrate the efficiency of the new method. In particular, the new method performs remarkably well for the nearest low-rank correlation matrix problem in terms of speed and solution quality and is considerably competitive with the widely used SCF iteration for the Kohn-Sham total energy minimization.Comment: 29 pages, 1 figur

    Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

    Full text link
    In this paper, we consider a class of possibly nonconvex, nonsmooth and non-Lipschitz optimization problems arising in many contemporary applications such as machine learning, variable selection and image processing. To solve this class of problems, we propose a proximal gradient method with extrapolation and line search (PGels). This method is developed based on a special potential function and successfully incorporates both extrapolation and non-monotone line search, which are two simple and efficient accelerating techniques for the proximal gradient method. Thanks to the line search, this method allows more flexibilities in choosing the extrapolation parameters and updates them adaptively at each iteration if a certain line search criterion is not satisfied. Moreover, with proper choices of parameters, our PGels reduces to many existing algorithms. We also show that, under some mild conditions, our line search criterion is well defined and any cluster point of the sequence generated by PGels is a stationary point of our problem. In addition, by assuming the Kurdyka-{\L}ojasiewicz exponent of the objective in our problem, we further analyze the local convergence rate of two special cases of PGels, including the widely used non-monotone proximal gradient method as one case. Finally, we conduct some numerical experiments for solving the â„“1\ell_1 regularized logistic regression problem and the â„“1-2\ell_{1\text{-}2} regularized least squares problem. Our numerical results illustrate the efficiency of PGels and show the potential advantage of combining two accelerating techniques.Comment: This version addresses some typos in previous version and adds more comparison

    Spherical Principal Component Analysis

    Full text link
    Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean distance and angle distance. However, due to the nonconvexity of the objective and constraints, the optimized solution is not easy to obtain. We propose an alternating linearized minimization method to solve it with provable convergence rate and guarantee. Experiments on synthetic data and real-world datasets have validated the effectiveness of our method and demonstrated its advantages over state-of-art clustering methods

    Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms

    Full text link
    Matrix Factorization is a popular non-convex optimization problem, for which alternating minimization schemes are mostly used. They usually suffer from the major drawback that the solution is biased towards one of the optimization variables. A remedy is non-alternating schemes. However, due to a lack of Lipschitz continuity of the gradient in matrix factorization problems, convergence cannot be guaranteed. A recently developed approach relies on the concept of Bregman distances, which generalizes the standard Euclidean distance. We exploit this theory by proposing a novel Bregman distance for matrix factorization problems, which, at the same time, allows for simple/closed form update steps. Therefore, for non-alternating schemes, such as the recently introduced Bregman Proximal Gradient (BPG) method and an inertial variant Convex--Concave Inertial BPG (CoCaIn BPG), convergence of the whole sequence to a stationary point is proved for Matrix Factorization. In several experiments, we observe a superior performance of our non-alternating schemes in terms of speed and objective value at the limit point.Comment: Accepted at NeuRIPS 2019. Paper url: http://papers.nips.cc/paper/8679-beyond-alternating-updates-for-matrix-factorization-with-inertial-bregman-proximal-gradient-algorithm

    HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems

    Full text link
    We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving large-scale partially separable optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speed-up the convergence. HAMSI is inherently parallel and it scales nicely with the number of processors. Combined with techniques for effectively utilizing modern parallel computer architectures, we illustrate that the proposed method converges more rapidly than a parallel stochastic gradient descent when both methods are used to solve large-scale matrix factorization problems. This performance gain comes only at the expense of using memory that scales linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many large scale problems, where first order methods based on variants of stochastic gradient descent are applicable.Comment: The software is available at https://github.com/spartensor/hamsi-m

    Parallelizable Algorithms for Optimization Problems with Orthogonality Constraints

    Full text link
    To construct a parallel approach for solving optimization problems with orthogonality constraints is usually regarded as an extremely difficult mission, due to the low scalability of the orthonormalization procedure. However, such demand is particularly huge in some application areas such as materials computation. In this paper, we propose a proximal linearized augmented Lagrangian algorithm (PLAM) for solving optimization problems with orthogonality constraints. Unlike the classical augmented Lagrangian methods, in our algorithm, the prime variables are updated by minimizing a proximal linearized approximation of the augmented Lagrangian function, meanwhile the dual variables are updated by a closed-form expression which holds at any first-order stationary point. The orthonormalization procedure is only invoked once at the last step of the above mentioned algorithm if high-precision feasibility is needed. Consequently, the main parts of the proposed algorithm can be parallelized naturally. We establish global subsequence convergence, worst-case complexity and local convergence rate for PLAM under some mild assumptions. To reduce the sensitivity of the penalty parameter, we put forward a modification of PLAM, which is called parallelizable column-wise block minimization of PLAM (PCAL). Numerical experiments in serial illustrate that the novel updating rule for the Lagrangian multipliers significantly accelerates the convergence of PLAM and makes it comparable with the existent feasible solvers for optimization problems with orthogonality constraints, and the performance of PCAL does not highly rely on the choice of the penalty parameter. Numerical experiments under parallel environment demonstrate that PCAL attains good performance and high scalability in solving discretized Kohn-Sham total energy minimization problems

    Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms

    Full text link
    This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms.Comment: 18 pages, 4 figures, 3 tables; Published in IEEE Trans. Signal Processin

    A Penalty-free Infeasible Approach for a Class of Nonsmooth Optimization Problems over the Stiefel Manifold

    Full text link
    Transforming into an exact penalty function model with convex compact constraints yields efficient infeasible approaches for optimization problems with orthogonality constraints. For smooth and â„“2,1\ell_{2,1}-norm regularized cases, these infeasible approaches adopt simple and orthonormalization-free updating scheme and show their high efficiency in the test examples. However, to avoid orthonormalization while enforcing the feasibility of the final solution, these infeasible approaches introduce a quadratic penalty term, where an inappropriate penalty parameter can lead to numerical inefficiency. Inspired by penalty-free approaches for smooth optimization problems, we proposed a proximal first-order algorithm for a class of optimization problems with orthogonality constraints and nonsmooth regularization term. The consequent algorithm, named sequential linearized proximal gradient method (SLPG), alternatively takes tangential steps and normal steps to improve the optimality and feasibility respectively. In SLPG, the orthonormalization process is invoked only once at the last step if high precision in feasibility is needed, showing that main iterations in SLPG are orthonormalization-free. Besides, both the tangential steps and normal steps do not involve the penalty parameter, and thus SLPG is penalty-free and avoids the inefficiency by inappropriate penalty parameter. We analyze the global convergence properties of SLPG where the tangential steps are inexactly computed. By inexactly computing tangential steps, for smooth cases and â„“2,1\ell_{2,1}-norm regularized cases, SLPG has a closed-form updating scheme, which leads to its cheap tangential steps. Numerical experiments illustrate the numerical advantages of SLPG when compared with existing first-order methods

    FBstab: A Stabilized Semismooth Quadratic Programming Algorithm with Applications in Model Predictive Control

    Full text link
    This paper introduces the proximally stabilized Fischer-Burmeister method (FBstab); a new algorithm for convex quadratic programming that synergistically combines the proximal point algorithm with a primal-dual semismooth Newton-type method. FBstab is numerically robust, easy to warmstart, handles degenerate primal-dual solutions, detects infeasibility/unboundedness and requires only that the Hessian matrix be positive semidefinite. We outline the algorithm, provide convergence and convergence rate proofs, report some numerical results from model predictive control benchmarks, and also include experimental results. We show that FBstab is competitive with and often superior to, state of the art methods, has attractive scaling properties, and is especially promising for model predictive control applications
    • …
    corecore