780 research outputs found
Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold
We consider optimization problems over the Stiefel manifold whose objective
function is the summation of a smooth function and a nonsmooth function.
Existing methods for solving this kind of problems can be classified into three
classes. Algorithms in the first class rely on information of the subgradients
of the objective function and thus tend to converge slowly in practice.
Algorithms in the second class are proximal point algorithms, which involve
subproblems that can be as difficult as the original problem. Algorithms in the
third class are based on operator-splitting techniques, but they usually lack
rigorous convergence guarantees. In this paper, we propose a retraction-based
proximal gradient method for solving this class of problems. We prove that the
proposed method globally converges to a stationary point. Iteration complexity
for obtaining an -stationary solution is also analyzed. Numerical
results on solving sparse PCA and compressed modes problems are reported to
demonstrate the advantages of the proposed method
TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods
TMAC is a toolbox written in C++11 that implements algorithms based on a set
of modern methods for large-scale optimization. It covers a variety of
optimization problems, which can be both smooth and nonsmooth, convex and
nonconvex, as well as constrained and unconstrained. The algorithms implemented
in TMAC, such as the coordinate up- date method and operator splitting method,
are scalable as they decompose a problem into simple subproblems. These
algorithms can run in a multi-threaded fashion, either synchronously or
asynchronously, to take advantages of all the cores available. TMAC
architecture mimics how a scientist writes down an optimization algorithm.
Therefore, it is easy for one to obtain a new algorithm by making simple
modifications such as adding a new operator and adding a new splitting, while
maintaining the multicore parallelism and other features. The package is
available at https://github.com/uclaopt/TMAC
Unifying abstract inexact convergence theorems and block coordinate variable metric iPiano
An abstract convergence theorem for a class of generalized descent methods
that explicitly models relative errors is proved. The convergence theorem
generalizes and unifies several recent abstract convergence theorems. It is
applicable to possibly non-smooth and non-convex lower semi-continuous
functions that satisfy the Kurdyka--Lojasiewicz (KL) inequality, which
comprises a huge class of problems. Most of the recent algorithms that
explicitly prove convergence using the KL inequality can cast into the abstract
framework in this paper and, therefore, the generated sequence converges to a
stationary point of the objective function. Additional flexibility compared to
related approaches is gained by a descent property that is formulated with
respect to a function that is allowed to change along the iterations, a generic
distance measure, and an explicit/implicit relative error condition with
respect to finite linear combinations of distance terms. As an application of
the gained flexibility, the convergence of a block coordinate variable metric
version of iPiano (an inertial forward--backward splitting algorithm) is
proved, which performs favorably on an inpainting problem with a
Mumford--Shah-like regularization from image processing
A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems
We consider a class of nonconvex nonsmooth optimization problems whose
objective is the sum of a smooth function and a finite number of nonnegative
proper closed possibly nonsmooth functions (whose proximal mappings are easy to
compute), some of which are further composed with linear maps. This kind of
problems arises naturally in various applications when different regularizers
are introduced for inducing simultaneous structures in the solutions. Solving
these problems, however, can be challenging because of the coupled nonsmooth
functions: the corresponding proximal mapping can be hard to compute so that
standard first-order methods such as the proximal gradient algorithm cannot be
applied efficiently. In this paper, we propose a successive
difference-of-convex approximation method for solving this kind of problems. In
this algorithm, we approximate the nonsmooth functions by their Moreau
envelopes in each iteration. Making use of the simple observation that Moreau
envelopes of nonnegative proper closed functions are continuous {\em
difference-of-convex} functions, we can then approximately minimize the
approximation function by first-order methods with suitable majorization
techniques. These first-order methods can be implemented efficiently thanks to
the fact that the proximal mapping of {\em each} nonsmooth function is easy to
compute. Under suitable assumptions, we prove that the sequence generated by
our method is bounded and any accumulation point is a stationary point of the
objective. We also discuss how our method can be applied to concrete
applications such as nonconvex fused regularized optimization problems and
simultaneously structured matrix optimization problems, and illustrate the
performance numerically for these two specific applications
Splitting methods with variable metric for KL functions
We study the convergence of general abstract descent methods applied to a
lower semicontinuous nonconvex function f that satisfies the
Kurdyka-Lojasiewicz inequality in a Hilbert space. We prove that any precompact
sequence converges to a critical point of f and obtain new convergence rates
both for the values and the iterates. The analysis covers alternating versions
of the forward-backward method with variable metric and relative errors. As an
example, a nonsmooth and nonconvex version of the Levenberg-Marquardt algorithm
is detailled
Optimization of Inf-Convolution Regularized Nonconvex Composite Problems
In this work, we consider nonconvex composite problems that involve
inf-convolution with a Legendre function, which gives rise to an anisotropic
generalization of the proximal mapping and Moreau-envelope. In a convex setting
such problems can be solved via alternating minimization of a splitting
formulation, where the consensus constraint is penalized with a Legendre
function. In contrast, for nonconvex models it is in general unclear that this
approach yields stationary points to the infimal convolution problem. To this
end we analytically investigate local regularity properties of the
Moreau-envelope function under prox-regularity, which allows us to establish
the equivalence between stationary points of the splitting model and the
original inf-convolution model. We apply our theory to characterize stationary
points of the penalty objective, which is minimized by the elastic averaging
SGD (EASGD) method for distributed training. Numerically, we demonstrate the
practical relevance of the proposed approach on the important task of
distributed training of deep neural networks.Comment: Accepted as a Conference Paper to International Conference on
Artificial Intelligence and Statistics (AISTATS) 2019, Nah
Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems
In this paper, we consider a class of possibly nonconvex, nonsmooth and
non-Lipschitz optimization problems arising in many contemporary applications
such as machine learning, variable selection and image processing. To solve
this class of problems, we propose a proximal gradient method with
extrapolation and line search (PGels). This method is developed based on a
special potential function and successfully incorporates both extrapolation and
non-monotone line search, which are two simple and efficient accelerating
techniques for the proximal gradient method. Thanks to the line search, this
method allows more flexibilities in choosing the extrapolation parameters and
updates them adaptively at each iteration if a certain line search criterion is
not satisfied. Moreover, with proper choices of parameters, our PGels reduces
to many existing algorithms. We also show that, under some mild conditions, our
line search criterion is well defined and any cluster point of the sequence
generated by PGels is a stationary point of our problem. In addition, by
assuming the Kurdyka-{\L}ojasiewicz exponent of the objective in our problem,
we further analyze the local convergence rate of two special cases of PGels,
including the widely used non-monotone proximal gradient method as one case.
Finally, we conduct some numerical experiments for solving the
regularized logistic regression problem and the regularized
least squares problem. Our numerical results illustrate the efficiency of PGels
and show the potential advantage of combining two accelerating techniques.Comment: This version addresses some typos in previous version and adds more
comparison
Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization
In this paper, we consider the problem of minimizing the sum of nonconvex and
possibly nonsmooth functions over a connected multi-agent network, where the
agents have partial knowledge about the global cost function and can only
access the zeroth-order information (i.e., the functional values) of their
local cost functions. We propose and analyze a distributed primal-dual
gradient-free algorithm for this challenging problem. We show that by
appropriately choosing the parameters, the proposed algorithm converges to the
set of first order stationary solutions with a provable global sublinear
convergence rate. Numerical experiments demonstrate the effectiveness of our
proposed method for optimizing nonconvex and nonsmooth problems over a network.Comment: Long version of CDC pape
An Optimization Framework with Flexible Inexact Inner Iterations for Nonconvex and Nonsmooth Programming
In recent years, numerous vision and learning tasks have been (re)formulated
as nonconvex and nonsmooth programmings(NNPs). Although some algorithms have
been proposed for particular problems, designing fast and flexible optimization
schemes with theoretical guarantee is a challenging task for general NNPs. It
has been investigated that performing inexact inner iterations often benefit to
special applications case by case, but their convergence behaviors are still
unclear. Motivated by these practical experiences, this paper designs a novel
algorithmic framework, named inexact proximal alternating direction method
(IPAD) for solving general NNPs. We demonstrate that any numerical algorithms
can be incorporated into IPAD for solving subproblems and the convergence of
the resulting hybrid schemes can be consistently guaranteed by a series of
simple error conditions. Beyond the guarantee in theory, numerical experiments
on both synthesized and real-world data further demonstrate the superiority and
flexibility of our IPAD framework for practical use
The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems
We introduce the Asynchronous PALM algorithm, a new extension of the Proximal
Alternating Linearized Minimization (PALM) algorithm for solving nonsmooth,
nonconvex optimization problems. Like the PALM algorithm, each step of the
Asynchronous PALM algorithm updates a single block of coordinates; but unlike
the PALM algorithm, the Asynchronous PALM algorithm eliminates the need for
sequential updates that occur one after the other. Instead, our new algorithm
allows each of the coordinate blocks to be updated asynchronously and in any
order, which means that any number of computing cores can compute updates in
parallel without synchronizing their computations. In practice, this
asynchronization strategy often leads to speedups that increase linearly with
the number of computing cores.
We introduce two variants of the Asynchronous PALM algorithm, one stochastic
and one deterministic. In the stochastic \textit{and} deterministic cases, we
show that cluster points of the algorithm are stationary points. In the
deterministic case, we show that the algorithm converges globally whenever the
Kurdyka-{\L}ojasiewicz property holds for a function closely related to the
objective function, and we derive its convergence rate in a common special
case. Finally, we provide a concrete case in which our assumptions hold
- …