55 research outputs found
A Framework of Constraint Preserving Update Schemes for Optimization on Stiefel Manifold
This paper considers optimization problems on the Stiefel manifold
, where is the variable and
is the -by- identity matrix. A framework of constraint preserving
update schemes is proposed by decomposing each feasible point into the range
space of and the null space of . While this general
framework can unify many existing schemes, a new update scheme with low
complexity cost is also discovered. Then we study a feasible
Barzilai-Borwein-like method under the new update scheme. The global
convergence of the method is established with an adaptive nonmonotone line
search. The numerical tests on the nearest low-rank correlation matrix problem,
the Kohn-Sham total energy minimization and a specific problem from statistics
demonstrate the efficiency of the new method. In particular, the new method
performs remarkably well for the nearest low-rank correlation matrix problem in
terms of speed and solution quality and is considerably competitive with the
widely used SCF iteration for the Kohn-Sham total energy minimization.Comment: 29 pages, 1 figur
Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems
In this paper, we consider a class of possibly nonconvex, nonsmooth and
non-Lipschitz optimization problems arising in many contemporary applications
such as machine learning, variable selection and image processing. To solve
this class of problems, we propose a proximal gradient method with
extrapolation and line search (PGels). This method is developed based on a
special potential function and successfully incorporates both extrapolation and
non-monotone line search, which are two simple and efficient accelerating
techniques for the proximal gradient method. Thanks to the line search, this
method allows more flexibilities in choosing the extrapolation parameters and
updates them adaptively at each iteration if a certain line search criterion is
not satisfied. Moreover, with proper choices of parameters, our PGels reduces
to many existing algorithms. We also show that, under some mild conditions, our
line search criterion is well defined and any cluster point of the sequence
generated by PGels is a stationary point of our problem. In addition, by
assuming the Kurdyka-{\L}ojasiewicz exponent of the objective in our problem,
we further analyze the local convergence rate of two special cases of PGels,
including the widely used non-monotone proximal gradient method as one case.
Finally, we conduct some numerical experiments for solving the
regularized logistic regression problem and the regularized
least squares problem. Our numerical results illustrate the efficiency of PGels
and show the potential advantage of combining two accelerating techniques.Comment: This version addresses some typos in previous version and adds more
comparison
Spherical Principal Component Analysis
Principal Component Analysis (PCA) is one of the most important methods to
handle high dimensional data. However, most of the studies on PCA aim to
minimize the loss after projection, which usually measures the Euclidean
distance, though in some fields, angle distance is known to be more important
and critical for analysis. In this paper, we propose a method by adding
constraints on factors to unify the Euclidean distance and angle distance.
However, due to the nonconvexity of the objective and constraints, the
optimized solution is not easy to obtain. We propose an alternating linearized
minimization method to solve it with provable convergence rate and guarantee.
Experiments on synthetic data and real-world datasets have validated the
effectiveness of our method and demonstrated its advantages over state-of-art
clustering methods
Beyond Alternating Updates for Matrix Factorization with Inertial Bregman Proximal Gradient Algorithms
Matrix Factorization is a popular non-convex optimization problem, for which
alternating minimization schemes are mostly used. They usually suffer from the
major drawback that the solution is biased towards one of the optimization
variables. A remedy is non-alternating schemes. However, due to a lack of
Lipschitz continuity of the gradient in matrix factorization problems,
convergence cannot be guaranteed. A recently developed approach relies on the
concept of Bregman distances, which generalizes the standard Euclidean
distance. We exploit this theory by proposing a novel Bregman distance for
matrix factorization problems, which, at the same time, allows for
simple/closed form update steps. Therefore, for non-alternating schemes, such
as the recently introduced Bregman Proximal Gradient (BPG) method and an
inertial variant Convex--Concave Inertial BPG (CoCaIn BPG), convergence of the
whole sequence to a stationary point is proved for Matrix Factorization. In
several experiments, we observe a superior performance of our non-alternating
schemes in terms of speed and objective value at the limit point.Comment: Accepted at NeuRIPS 2019. Paper url:
http://papers.nips.cc/paper/8679-beyond-alternating-updates-for-matrix-factorization-with-inertial-bregman-proximal-gradient-algorithm
HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems
We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is
a provably convergent, second order incremental algorithm for solving
large-scale partially separable optimization problems. The algorithm is based
on a local quadratic approximation, and hence, allows incorporating curvature
information to speed-up the convergence. HAMSI is inherently parallel and it
scales nicely with the number of processors. Combined with techniques for
effectively utilizing modern parallel computer architectures, we illustrate
that the proposed method converges more rapidly than a parallel stochastic
gradient descent when both methods are used to solve large-scale matrix
factorization problems. This performance gain comes only at the expense of
using memory that scales linearly with the total size of the optimization
variables. We conclude that HAMSI may be considered as a viable alternative in
many large scale problems, where first order methods based on variants of
stochastic gradient descent are applicable.Comment: The software is available at https://github.com/spartensor/hamsi-m
Parallelizable Algorithms for Optimization Problems with Orthogonality Constraints
To construct a parallel approach for solving optimization problems with
orthogonality constraints is usually regarded as an extremely difficult
mission, due to the low scalability of the orthonormalization procedure.
However, such demand is particularly huge in some application areas such as
materials computation. In this paper, we propose a proximal linearized
augmented Lagrangian algorithm (PLAM) for solving optimization problems with
orthogonality constraints. Unlike the classical augmented Lagrangian methods,
in our algorithm, the prime variables are updated by minimizing a proximal
linearized approximation of the augmented Lagrangian function, meanwhile the
dual variables are updated by a closed-form expression which holds at any
first-order stationary point. The orthonormalization procedure is only invoked
once at the last step of the above mentioned algorithm if high-precision
feasibility is needed. Consequently, the main parts of the proposed algorithm
can be parallelized naturally. We establish global subsequence convergence,
worst-case complexity and local convergence rate for PLAM under some mild
assumptions. To reduce the sensitivity of the penalty parameter, we put forward
a modification of PLAM, which is called parallelizable column-wise block
minimization of PLAM (PCAL). Numerical experiments in serial illustrate that
the novel updating rule for the Lagrangian multipliers significantly
accelerates the convergence of PLAM and makes it comparable with the existent
feasible solvers for optimization problems with orthogonality constraints, and
the performance of PCAL does not highly rely on the choice of the penalty
parameter. Numerical experiments under parallel environment demonstrate that
PCAL attains good performance and high scalability in solving discretized
Kohn-Sham total energy minimization problems
Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms
This work addresses the problem of learning sparse representations of tensor
data using structured dictionary learning. It proposes learning a mixture of
separable dictionaries to better capture the structure of tensor data by
generalizing the separable dictionary learning model. Two different approaches
for learning mixture of separable dictionaries are explored and sufficient
conditions for local identifiability of the underlying dictionary are derived
in each case. Moreover, computational algorithms are developed to solve the
problem of learning mixture of separable dictionaries in both batch and online
settings. Numerical experiments are used to show the usefulness of the proposed
model and the efficacy of the developed algorithms.Comment: 18 pages, 4 figures, 3 tables; Published in IEEE Trans. Signal
Processin
A Penalty-free Infeasible Approach for a Class of Nonsmooth Optimization Problems over the Stiefel Manifold
Transforming into an exact penalty function model with convex compact
constraints yields efficient infeasible approaches for optimization problems
with orthogonality constraints. For smooth and -norm regularized
cases, these infeasible approaches adopt simple and orthonormalization-free
updating scheme and show their high efficiency in the test examples. However,
to avoid orthonormalization while enforcing the feasibility of the final
solution, these infeasible approaches introduce a quadratic penalty term, where
an inappropriate penalty parameter can lead to numerical inefficiency. Inspired
by penalty-free approaches for smooth optimization problems, we proposed a
proximal first-order algorithm for a class of optimization problems with
orthogonality constraints and nonsmooth regularization term. The consequent
algorithm, named sequential linearized proximal gradient method (SLPG),
alternatively takes tangential steps and normal steps to improve the optimality
and feasibility respectively. In SLPG, the orthonormalization process is
invoked only once at the last step if high precision in feasibility is needed,
showing that main iterations in SLPG are orthonormalization-free. Besides, both
the tangential steps and normal steps do not involve the penalty parameter, and
thus SLPG is penalty-free and avoids the inefficiency by inappropriate penalty
parameter. We analyze the global convergence properties of SLPG where the
tangential steps are inexactly computed. By inexactly computing tangential
steps, for smooth cases and -norm regularized cases, SLPG has a
closed-form updating scheme, which leads to its cheap tangential steps.
Numerical experiments illustrate the numerical advantages of SLPG when compared
with existing first-order methods
FBstab: A Stabilized Semismooth Quadratic Programming Algorithm with Applications in Model Predictive Control
This paper introduces the proximally stabilized Fischer-Burmeister method
(FBstab); a new algorithm for convex quadratic programming that synergistically
combines the proximal point algorithm with a primal-dual semismooth Newton-type
method. FBstab is numerically robust, easy to warmstart, handles degenerate
primal-dual solutions, detects infeasibility/unboundedness and requires only
that the Hessian matrix be positive semidefinite. We outline the algorithm,
provide convergence and convergence rate proofs, report some numerical results
from model predictive control benchmarks, and also include experimental
results. We show that FBstab is competitive with and often superior to, state
of the art methods, has attractive scaling properties, and is especially
promising for model predictive control applications
- …