1,096 research outputs found
Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization
In this paper, we propose a faster stochastic alternating direction method of
multipliers (ADMM) for nonconvex optimization by using a new stochastic
path-integrated differential estimator (SPIDER), called as SPIDER-ADMM.
Moreover, we prove that the SPIDER-ADMM achieves a record-breaking incremental
first-order oracle (IFO) complexity of
for finding an -approximate stationary point, which improves the
deterministic ADMM by a factor , where denotes the
sample size. As one of major contribution of this paper, we provide a new
theoretical analysis framework for nonconvex stochastic ADMM methods with
providing the optimal IFO complexity. Based on this new analysis framework, we
study the unsolved optimal IFO complexity of the existing non-convex SVRG-ADMM
and SAGA-ADMM methods, and prove they have the optimal IFO complexity of
. Thus, the SPIDER-ADMM improves the
existing stochastic ADMM methods by a factor of .
Moreover, we extend SPIDER-ADMM to the online setting, and propose a faster
online SPIDER-ADMM. Our theoretical analysis shows that the online SPIDER-ADMM
has the IFO complexity of , which
improves the existing best results by a factor of
. Finally, the experimental results on
benchmark datasets validate that the proposed algorithms have faster
convergence rate than the existing ADMM algorithms for nonconvex optimization.Comment: Published in ICML 2019, 43 pages. arXiv admin note: text overlap with
arXiv:1907.1346
Stochastic Variance-Reduced ADMM
The alternating direction method of multipliers (ADMM) is a powerful
optimization solver in machine learning. Recently, stochastic ADMM has been
integrated with variance reduction methods for stochastic gradient, leading to
SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration
complexities. However, their space requirements can still be high. In this
paper, we propose an integration of ADMM with the method of stochastic variance
reduced gradient (SVRG). Unlike another recent integration attempt called
SCAS-ADMM, the proposed algorithm retains the fast convergence benefits of
SAG-ADMM and SDCA-ADMM, but is more advantageous in that its storage
requirement is very low, even independent of the sample size . We also
extend the proposed method for nonconvex problems, and obtain a convergence
rate of . Experimental results demonstrate that it is as fast as
SAG-ADMM and SDCA-ADMM, much faster than SCAS-ADMM, and can be used on much
bigger data sets
Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity
The use of convex regularizers allows for easy optimization, though they
often produce biased estimation and inferior prediction performance. Recently,
nonconvex regularizers have attracted a lot of attention and outperformed
convex ones. However, the resultant optimization problem is much harder. In
this paper, for a large class of nonconvex regularizers, we propose to move the
nonconvexity from the regularizer to the loss. The nonconvex regularizer is
then transformed to a familiar convex regularizer, while the resultant loss
function can still be guaranteed to be smooth. Learning with the convexified
regularizer can be performed by existing efficient algorithms originally
designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe
algorithm, alternating direction method of multipliers and stochastic gradient
descent). Extensions are made when the convexified regularizer does not have
closed-form proximal step, and when the loss function is nonconvex, nonsmooth.
Extensive experiments on a variety of machine learning application scenarios
show that optimizing the transformed problem is much faster than running the
state-of-the-art on the original problem.Comment: Journal version of previous conference paper appeared at ICML-2016
with same titl
Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
With the large rising of complex data, the nonconvex models such as nonconvex
loss function and nonconvex regularizer are widely used in machine learning and
pattern recognition. In this paper, we propose a class of mini-batch stochastic
ADMMs (alternating direction method of multipliers) for solving large-scale
nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch
size, the mini-batch stochastic ADMM without variance reduction (VR) technique
is convergent and reaches a convergence rate of to obtain a stationary
point of the nonconvex optimization, where denotes the number of
iterations. Moreover, we extend the mini-batch stochastic gradient method to
both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript
\cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also
reaches the convergence rate of without condition on the mini-batch
size. In particular, we provide a specific parameter selection for step size
of stochastic gradients and penalty parameter of augmented
Lagrangian function. Finally, extensive experimental results on both simulated
and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text
overlap with arXiv:1610.0275
Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization
Alternating direction method of multipliers (ADMM) is a popular optimization
tool for the composite and constrained problems in machine learning. However,
in many machine learning problems such as black-box attacks and bandit
feedback, ADMM could fail because the explicit gradients of these problems are
difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can
effectively solve these problems due to that the objective function values are
only required in the optimization. Recently, though there exist a few
zeroth-order ADMM methods, they build on the convexity of objective function.
Clearly, these existing zeroth-order methods are limited in many applications.
In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM
methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems
with multiple nonsmooth penalties, based on the coordinate smoothing gradient
estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have
convergence rate of , where denotes the number of iterations. In
particular, our methods not only reach the best convergence rate for
the nonconvex optimization, but also are able to effectively solve many complex
machine learning problems with multiple regularized penalties and constraints.
Finally, we conduct the experiments of black-box binary classification and
structured adversarial attack on black-box deep neural network to validate the
efficiency of our algorithms.Comment: To Appear in IJCAI 2019. Supplementary materials are adde
Zeroth Order Nonconvex Multi-Agent Optimization over Networks
In this paper, we consider distributed optimization problems over a
multi-agent network, where each agent can only partially evaluate the objective
function, and it is allowed to exchange messages with its immediate neighbors.
Differently from all existing works on distributed optimization, our focus is
given to optimizing a class of non-convex problems, and under the challenging
setting where each agent can only access the zeroth-order information (i.e.,
the functional values) of its local functions. For different types of network
topologies such as undirected connected networks or star networks, we develop
efficient distributed algorithms and rigorously analyze their convergence and
rate of convergence (to the set of stationary solutions). Numerical results are
provided to demonstrate the efficiency of the proposed algorithms
Practical Algorithms for Learning Near-Isometric Linear Embeddings
We propose two practical non-convex approaches for learning near-isometric,
linear embeddings of finite sets of data points. Given a set of training points
, we consider the secant set that consists of all
pairwise difference vectors of , normalized to lie on the unit
sphere. The problem can be formulated as finding a symmetric and positive
semi-definite matrix that preserves the norms of all the
vectors in up to a distortion parameter . Motivated by
non-negative matrix factorization, we reformulate our problem into a Frobenius
norm minimization problem, which is solved by the Alternating Direction Method
of Multipliers (ADMM) and develop an algorithm, FroMax. Another method solves
for a projection matrix by minimizing the restricted
isometry property (RIP) directly over the set of symmetric, postive
semi-definite matrices. Applying ADMM and a Moreau decomposition on a proximal
mapping, we develop another algorithm, NILE-Pro, for dimensionality reduction.
FroMax is shown to converge faster for smaller while NILE-Pro
converges faster for larger . Both non-convex approaches are then
empirically demonstrated to be more computationally efficient than prior convex
approaches for a number of applications in machine learning and signal
processing
Residual Expansion Algorithm: Fast and Effective Optimization for Nonconvex Least Squares Problems
We propose the residual expansion (RE) algorithm: a global (or near-global)
optimization method for nonconvex least squares problems. Unlike most existing
nonconvex optimization techniques, the RE algorithm is not based on either
stochastic or multi-point searches; therefore, it can achieve fast global
optimization. Moreover, the RE algorithm is easy to implement and successful in
high-dimensional optimization. The RE algorithm exhibits excellent empirical
performance in terms of k-means clustering, point-set registration, optimized
product quantization, and blind image deblurring.Comment: Accepted to CVPR201
Survey: Sixty Years of Douglas--Rachford
The Douglas--Rachford method is a splitting method frequently employed for
finding zeroes of sums of maximally monotone operators. When the operators in
question are normal cones operators, the iterated process may be used to solve
feasibility problems of the form: Find The success
of the method in the context of closed, convex, nonempty sets
is well-known and understood from a theoretical standpoint. However, its
performance in the nonconvex context is less understood yet surprisingly
impressive. This was particularly compelling to Jonathan M. Borwein who,
intrigued by Elser, Rankenburg, and Thibault's success in applying the method
for solving Sudoku Puzzles, began an investigation of his own. We survey the
current body of literature on the subject, and we summarize its history. We
especially commemorate Professor Borwein's celebrated contributions to the
area
Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization
In this paper, we consider the problem of minimizing the sum of nonconvex and
possibly nonsmooth functions over a connected multi-agent network, where the
agents have partial knowledge about the global cost function and can only
access the zeroth-order information (i.e., the functional values) of their
local cost functions. We propose and analyze a distributed primal-dual
gradient-free algorithm for this challenging problem. We show that by
appropriately choosing the parameters, the proposed algorithm converges to the
set of first order stationary solutions with a provable global sublinear
convergence rate. Numerical experiments demonstrate the effectiveness of our
proposed method for optimizing nonconvex and nonsmooth problems over a network.Comment: Long version of CDC pape
- …