1,973 research outputs found
Fast Stochastic Methods for Nonsmooth Nonconvex Optimization
We analyze stochastic algorithms for optimizing nonconvex, nonsmooth
finite-sum problems, where the nonconvex part is smooth and the nonsmooth part
is convex. Surprisingly, unlike the smooth case, our knowledge of this
fundamental problem is very limited. For example, it is not known whether the
proximal stochastic gradient method with constant minibatch converges to a
stationary point. To tackle this issue, we develop fast stochastic algorithms
that provably converge to a stationary point for constant minibatches.
Furthermore, using a variant of these algorithms, we show provably faster
convergence than batch proximal gradient descent. Finally, we prove global
linear convergence rate for an interesting subclass of nonsmooth nonconvex
functions, that subsumes several recent works. This paper builds upon our
recent series of papers on fast stochastic methods for smooth nonconvex
optimization [22, 23], with a novel analysis for nonconvex and nonsmooth
functions
Algorithms for solving optimization problems arising from deep neural net models: nonsmooth problems
Machine Learning models incorporating multiple layered learning networks have
been seen to provide effective models for various classification problems. The
resulting optimization problem to solve for the optimal vector minimizing the
empirical risk is, however, highly nonconvex. This alone presents a challenge
to application and development of appropriate optimization algorithms for
solving the problem. However, in addition, there are a number of interesting
problems for which the objective function is non- smooth and nonseparable. In
this paper, we summarize the primary challenges involved, the state of the art,
and present some numerical results on an interesting and representative class
of problems
Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
With the large rising of complex data, the nonconvex models such as nonconvex
loss function and nonconvex regularizer are widely used in machine learning and
pattern recognition. In this paper, we propose a class of mini-batch stochastic
ADMMs (alternating direction method of multipliers) for solving large-scale
nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch
size, the mini-batch stochastic ADMM without variance reduction (VR) technique
is convergent and reaches a convergence rate of to obtain a stationary
point of the nonconvex optimization, where denotes the number of
iterations. Moreover, we extend the mini-batch stochastic gradient method to
both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript
\cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also
reaches the convergence rate of without condition on the mini-batch
size. In particular, we provide a specific parameter selection for step size
of stochastic gradients and penalty parameter of augmented
Lagrangian function. Finally, extensive experimental results on both simulated
and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text
overlap with arXiv:1610.0275
The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM
We introduce the Stochastic Asynchronous Proximal Alternating Linearized
Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient
method for solving nonconvex, nonsmooth optimization problems. SAPALM is the
first asynchronous parallel optimization method that provably converges on a
large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the
best known rates of convergence --- among synchronous or asynchronous methods
--- on this problem class. We provide upper bounds on the number of workers for
which we can expect to see a linear speedup, which match the best bounds known
for less complex problems, and show that in practice SAPALM achieves this
linear speedup. We demonstrate state-of-the-art performance on several matrix
factorization problems
Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
In many modern machine learning applications, structures of underlying
mathematical models often yield nonconvex optimization problems. Due to the
intractability of nonconvexity, there is a rising need to develop efficient
methods for solving general nonconvex problems with certain performance
guarantee. In this work, we investigate the accelerated proximal gradient
method for nonconvex programming (APGnc). The method compares between a usual
proximal gradient step and a linear extrapolation step, and accepts the one
that has a lower function value to achieve a monotonic decrease. In specific,
under a general nonsmooth and nonconvex setting, we provide a rigorous argument
to show that the limit points of the sequence generated by APGnc are critical
points of the objective function. Then, by exploiting the
Kurdyka-{\L}ojasiewicz (\KL) property for a broad class of functions, we
establish the linear and sub-linear convergence rates of the function value
sequence generated by APGnc. We further propose a stochastic variance reduced
APGnc (SVRG-APGnc), and establish its linear convergence under a special case
of the \KL property. We also extend the analysis to the inexact version of
these methods and develop an adaptive momentum strategy that improves the
numerical performance.Comment: Accepted in ICML 2017, 9 papes, 4 figure
Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity
The use of convex regularizers allows for easy optimization, though they
often produce biased estimation and inferior prediction performance. Recently,
nonconvex regularizers have attracted a lot of attention and outperformed
convex ones. However, the resultant optimization problem is much harder. In
this paper, for a large class of nonconvex regularizers, we propose to move the
nonconvexity from the regularizer to the loss. The nonconvex regularizer is
then transformed to a familiar convex regularizer, while the resultant loss
function can still be guaranteed to be smooth. Learning with the convexified
regularizer can be performed by existing efficient algorithms originally
designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe
algorithm, alternating direction method of multipliers and stochastic gradient
descent). Extensions are made when the convexified regularizer does not have
closed-form proximal step, and when the loss function is nonconvex, nonsmooth.
Extensive experiments on a variety of machine learning application scenarios
show that optimizing the transformed problem is much faster than running the
state-of-the-art on the original problem.Comment: Journal version of previous conference paper appeared at ICML-2016
with same titl
Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization
Alternating direction method of multipliers (ADMM) is a popular optimization
tool for the composite and constrained problems in machine learning. However,
in many machine learning problems such as black-box attacks and bandit
feedback, ADMM could fail because the explicit gradients of these problems are
difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can
effectively solve these problems due to that the objective function values are
only required in the optimization. Recently, though there exist a few
zeroth-order ADMM methods, they build on the convexity of objective function.
Clearly, these existing zeroth-order methods are limited in many applications.
In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM
methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems
with multiple nonsmooth penalties, based on the coordinate smoothing gradient
estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have
convergence rate of , where denotes the number of iterations. In
particular, our methods not only reach the best convergence rate for
the nonconvex optimization, but also are able to effectively solve many complex
machine learning problems with multiple regularized penalties and constraints.
Finally, we conduct the experiments of black-box binary classification and
structured adversarial attack on black-box deep neural network to validate the
efficiency of our algorithms.Comment: To Appear in IJCAI 2019. Supplementary materials are adde
Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis
In this paper we study nonconvex and nonsmooth multi-block optimization over
Riemannian manifolds with coupled linear constraints. Such optimization
problems naturally arise from machine learning, statistical learning,
compressive sensing, image processing, and tensor PCA, among others. We develop
an ADMM-like primal-dual approach based on decoupled solvable subroutines such
as linearized proximal mappings. First, we introduce the optimality conditions
for the afore-mentioned optimization models. Then, the notion of
-stationary solutions is introduced as a result. The main part of the
paper is to show that the proposed algorithms enjoy an iteration complexity of
to reach an -stationary solution. For prohibitively
large-size tensor or machine learning models, we present a sampling-based
stochastic algorithm with the same iteration complexity bound in expectation.
In case the subproblems are not analytically solvable, a feasible curvilinear
line-search variant of the algorithm based on retraction operators is proposed.
Finally, we show specifically how the algorithms can be implemented to solve a
variety of practical problems such as the NP-hard maximum bisection problem,
the regularized sparse tensor principal component analysis and the
community detection problem. Our preliminary numerical results show great
potentials of the proposed methods
A Proximal Zeroth-Order Algorithm for Nonconvex Nonsmooth Problems
In this paper, we focus on solving an important class of nonconvex
optimization problems which includes many problems for example signal
processing over a networked multi-agent system and distributed learning over
networks. Motivated by many applications in which the local objective function
is the sum of smooth but possibly nonconvex part, and non-smooth but convex
part subject to a linear equality constraint, this paper proposes a proximal
zeroth-order primal dual algorithm (PZO-PDA) that accounts for the information
structure of the problem. This algorithm only utilize the zeroth-order
information (i.e., the functional values) of smooth functions, yet the
flexibility is achieved for applications that only noisy information of the
objective function is accessible, where classical methods cannot be applied. We
prove convergence and rate of convergence for PZO-PDA. Numerical experiments
are provided to validate the theoretical results
Generalized Uniformly Optimal Methods for Nonlinear Programming
In this paper, we present a generic framework to extend existing uniformly
optimal convex programming algorithms to solve more general nonlinear, possibly
nonconvex, optimization problems. The basic idea is to incorporate a local
search step (gradient descent or Quasi-Newton iteration) into these uniformly
optimal convex programming methods, and then enforce a monotone decreasing
property of the function values computed along the trajectory. Algorithms of
these types will then achieve the best known complexity for nonconvex problems,
and the optimal complexity for convex ones without requiring any problem
parameters. As a consequence, we can have a unified treatment for a general
class of nonlinear programming problems regardless of their convexity and
smoothness level. In particular, we show that the accelerated gradient and
level methods, both originally designed for solving convex optimization
problems only, can be used for solving both convex and nonconvex problems
uniformly. In a similar vein, we show that some well-studied techniques for
nonlinear programming, e.g., Quasi-Newton iteration, can be embedded into
optimal convex optimization algorithms to possibly further enhance their
numerical performance. Our theoretical and algorithmic developments are
complemented by some promising numerical results obtained for solving a few
important nonconvex and nonlinear data analysis problems in the literature
- …