1,096 research outputs found

    Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization

    Full text link
    In this paper, we propose a faster stochastic alternating direction method of multipliers (ADMM) for nonconvex optimization by using a new stochastic path-integrated differential estimator (SPIDER), called as SPIDER-ADMM. Moreover, we prove that the SPIDER-ADMM achieves a record-breaking incremental first-order oracle (IFO) complexity of O(n+n1/2ϵ1)\mathcal{O}(n+n^{1/2}\epsilon^{-1}) for finding an ϵ\epsilon-approximate stationary point, which improves the deterministic ADMM by a factor O(n1/2)\mathcal{O}(n^{1/2}), where nn denotes the sample size. As one of major contribution of this paper, we provide a new theoretical analysis framework for nonconvex stochastic ADMM methods with providing the optimal IFO complexity. Based on this new analysis framework, we study the unsolved optimal IFO complexity of the existing non-convex SVRG-ADMM and SAGA-ADMM methods, and prove they have the optimal IFO complexity of O(n+n2/3ϵ1)\mathcal{O}(n+n^{2/3}\epsilon^{-1}). Thus, the SPIDER-ADMM improves the existing stochastic ADMM methods by a factor of O(n1/6)\mathcal{O}(n^{1/6}). Moreover, we extend SPIDER-ADMM to the online setting, and propose a faster online SPIDER-ADMM. Our theoretical analysis shows that the online SPIDER-ADMM has the IFO complexity of O(ϵ32)\mathcal{O}(\epsilon^{-\frac{3}{2}}), which improves the existing best results by a factor of O(ϵ12)\mathcal{O}(\epsilon^{-\frac{1}{2}}). Finally, the experimental results on benchmark datasets validate that the proposed algorithms have faster convergence rate than the existing ADMM algorithms for nonconvex optimization.Comment: Published in ICML 2019, 43 pages. arXiv admin note: text overlap with arXiv:1907.1346

    Stochastic Variance-Reduced ADMM

    Full text link
    The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning. Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities. However, their space requirements can still be high. In this paper, we propose an integration of ADMM with the method of stochastic variance reduced gradient (SVRG). Unlike another recent integration attempt called SCAS-ADMM, the proposed algorithm retains the fast convergence benefits of SAG-ADMM and SDCA-ADMM, but is more advantageous in that its storage requirement is very low, even independent of the sample size nn. We also extend the proposed method for nonconvex problems, and obtain a convergence rate of O(1/T)O(1/T). Experimental results demonstrate that it is as fast as SAG-ADMM and SDCA-ADMM, much faster than SCAS-ADMM, and can be used on much bigger data sets

    Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

    Full text link
    The use of convex regularizers allows for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, for a large class of nonconvex regularizers, we propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe algorithm, alternating direction method of multipliers and stochastic gradient descent). Extensions are made when the convexified regularizer does not have closed-form proximal step, and when the loss function is nonconvex, nonsmooth. Extensive experiments on a variety of machine learning application scenarios show that optimizing the transformed problem is much faster than running the state-of-the-art on the original problem.Comment: Journal version of previous conference paper appeared at ICML-2016 with same titl

    Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization

    Full text link
    With the large rising of complex data, the nonconvex models such as nonconvex loss function and nonconvex regularizer are widely used in machine learning and pattern recognition. In this paper, we propose a class of mini-batch stochastic ADMMs (alternating direction method of multipliers) for solving large-scale nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches a convergence rate of O(1/T)O(1/T) to obtain a stationary point of the nonconvex optimization, where TT denotes the number of iterations. Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript \cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also reaches the convergence rate of O(1/T)O(1/T) without condition on the mini-batch size. In particular, we provide a specific parameter selection for step size η\eta of stochastic gradients and penalty parameter ρ\rho of augmented Lagrangian function. Finally, extensive experimental results on both simulated and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text overlap with arXiv:1610.0275

    Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

    Full text link
    Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of O(1/T)O(1/T), where TT denotes the number of iterations. In particular, our methods not only reach the best convergence rate O(1/T)O(1/T) for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.Comment: To Appear in IJCAI 2019. Supplementary materials are adde

    Zeroth Order Nonconvex Multi-Agent Optimization over Networks

    Full text link
    In this paper, we consider distributed optimization problems over a multi-agent network, where each agent can only partially evaluate the objective function, and it is allowed to exchange messages with its immediate neighbors. Differently from all existing works on distributed optimization, our focus is given to optimizing a class of non-convex problems, and under the challenging setting where each agent can only access the zeroth-order information (i.e., the functional values) of its local functions. For different types of network topologies such as undirected connected networks or star networks, we develop efficient distributed algorithms and rigorously analyze their convergence and rate of convergence (to the set of stationary solutions). Numerical results are provided to demonstrate the efficiency of the proposed algorithms

    Practical Algorithms for Learning Near-Isometric Linear Embeddings

    Full text link
    We propose two practical non-convex approaches for learning near-isometric, linear embeddings of finite sets of data points. Given a set of training points X\mathcal{X}, we consider the secant set S(X)S(\mathcal{X}) that consists of all pairwise difference vectors of X\mathcal{X}, normalized to lie on the unit sphere. The problem can be formulated as finding a symmetric and positive semi-definite matrix Ψ\boldsymbol{\Psi} that preserves the norms of all the vectors in S(X)S(\mathcal{X}) up to a distortion parameter δ\delta. Motivated by non-negative matrix factorization, we reformulate our problem into a Frobenius norm minimization problem, which is solved by the Alternating Direction Method of Multipliers (ADMM) and develop an algorithm, FroMax. Another method solves for a projection matrix Ψ\boldsymbol{\Psi} by minimizing the restricted isometry property (RIP) directly over the set of symmetric, postive semi-definite matrices. Applying ADMM and a Moreau decomposition on a proximal mapping, we develop another algorithm, NILE-Pro, for dimensionality reduction. FroMax is shown to converge faster for smaller δ\delta while NILE-Pro converges faster for larger δ\delta. Both non-convex approaches are then empirically demonstrated to be more computationally efficient than prior convex approaches for a number of applications in machine learning and signal processing

    Residual Expansion Algorithm: Fast and Effective Optimization for Nonconvex Least Squares Problems

    Full text link
    We propose the residual expansion (RE) algorithm: a global (or near-global) optimization method for nonconvex least squares problems. Unlike most existing nonconvex optimization techniques, the RE algorithm is not based on either stochastic or multi-point searches; therefore, it can achieve fast global optimization. Moreover, the RE algorithm is easy to implement and successful in high-dimensional optimization. The RE algorithm exhibits excellent empirical performance in terms of k-means clustering, point-set registration, optimized product quantization, and blind image deblurring.Comment: Accepted to CVPR201

    Survey: Sixty Years of Douglas--Rachford

    Full text link
    The Douglas--Rachford method is a splitting method frequently employed for finding zeroes of sums of maximally monotone operators. When the operators in question are normal cones operators, the iterated process may be used to solve feasibility problems of the form: Find xk=1NSk.x \in \bigcap_{k=1}^N S_k. The success of the method in the context of closed, convex, nonempty sets S1,,SNS_1,\dots,S_N is well-known and understood from a theoretical standpoint. However, its performance in the nonconvex context is less understood yet surprisingly impressive. This was particularly compelling to Jonathan M. Borwein who, intrigued by Elser, Rankenburg, and Thibault's success in applying the method for solving Sudoku Puzzles, began an investigation of his own. We survey the current body of literature on the subject, and we summarize its history. We especially commemorate Professor Borwein's celebrated contributions to the area

    Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization

    Full text link
    In this paper, we consider the problem of minimizing the sum of nonconvex and possibly nonsmooth functions over a connected multi-agent network, where the agents have partial knowledge about the global cost function and can only access the zeroth-order information (i.e., the functional values) of their local cost functions. We propose and analyze a distributed primal-dual gradient-free algorithm for this challenging problem. We show that by appropriately choosing the parameters, the proposed algorithm converges to the set of first order stationary solutions with a provable global sublinear convergence rate. Numerical experiments demonstrate the effectiveness of our proposed method for optimizing nonconvex and nonsmooth problems over a network.Comment: Long version of CDC pape
    corecore