5 research outputs found

    Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity

    Full text link
    Zeroth-order methods powerful optimization tools for solving many machine learning problems because it only need function values (not gradient) in the optimization. Recently, although many zeroth-order methods have been developed, these approaches still have two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a class of faster zeroth-order stochastic alternating direction method of multipliers (ADMM) methods (ZO-SPIDER-ADMM) to solve the nonconvex finite-sum problems with multiple nonsmooth penalties. Moreover, we prove that the ZO-SPIDER-ADMM methods can achieve a lower function query complexity of O(nd+dn12ϵ−1)O(nd+dn^{\frac{1}{2}}\epsilon^{-1}) for finding an ϵ\epsilon-stationary point, which improves the existing best nonconvex zeroth-order ADMM methods by a factor of O(d13n16)O(d^{\frac{1}{3}}n^{\frac{1}{6}}), where nn and dd denote the sample size and dimension of data, respectively. At the same time, we propose a class of faster zeroth-order online ADMM methods (ZOO-ADMM+) to solve the nonconvex online problems with multiple nonsmooth penalties. We also prove that the proposed ZOO-ADMM+ methods can achieve a lower function query complexity of O(dϵ−32)O(d\epsilon^{-\frac{3}{2}}), which improves the existing best result by a factor of O(ϵ−12)O(\epsilon^{-\frac{1}{2}}). Extensive experimental results on the structure adversarial attack on black-box deep neural networks demonstrate the efficiency of our new algorithms.Comment: 34 page

    Accelerated Stochastic Gradient-free and Projection-free Methods

    Full text link
    In the paper, we propose a class of accelerated stochastic gradient-free and projection-free (a.k.a., zeroth-order Frank-Wolfe) methods to solve the constrained stochastic and finite-sum nonconvex optimization. Specifically, we propose an accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW) method based on the variance reduced technique of SPIDER/SpiderBoost and a novel momentum accelerated technique. Moreover, under some mild conditions, we prove that the Acc-SZOFW has the function query complexity of O(dnϵ−2)O(d\sqrt{n}\epsilon^{-2}) for finding an ϵ\epsilon-stationary point in the finite-sum problem, which improves the exiting best result by a factor of O(nϵ−2)O(\sqrt{n}\epsilon^{-2}), and has the function query complexity of O(dϵ−3)O(d\epsilon^{-3}) in the stochastic problem, which improves the exiting best result by a factor of O(ϵ−1)O(\epsilon^{-1}). To relax the large batches required in the Acc-SZOFW, we further propose a novel accelerated stochastic zeroth-order Frank-Wolfe (Acc-SZOFW*) based on a new variance reduced technique of STORM, which still reaches the function query complexity of O(dϵ−3)O(d\epsilon^{-3}) in the stochastic problem without relying on any large batches. In particular, we present an accelerated framework of the Frank-Wolfe methods based on the proposed momentum accelerated technique. The extensive experimental results on black-box adversarial attack and robust black-box classification demonstrate the efficiency of our algorithms.Comment: Accepted to ICML 2020, 34 page

    Zeroth-Order Algorithms for Stochastic Distributed Nonconvex Optimization

    Full text link
    In this paper, we consider a stochastic distributed nonconvex optimization problem with the cost function being distributed over nn agents having access only to zeroth-order (ZO) information of the cost. This problem has various machine learning applications. As a solution, we propose two distributed ZO algorithms, in which at each iteration each agent samples the local stochastic ZO oracle at two points with an adaptive smoothing parameter. We show that the proposed algorithms achieve the linear speedup convergence rate O(p/(nT))\mathcal{O}(\sqrt{p/(nT)}) for smooth cost functions and O(p/(nT))\mathcal{O}(p/(nT)) convergence rate when the global cost function additionally satisfies the Polyak--Lojasiewicz (P--L) condition, where pp and TT are the dimension of the decision variable and the total number of iterations, respectively. To the best of our knowledge, this is the first linear speedup result for distributed ZO algorithms, which enables systematic processing performance improvements by adding more agents. We also show that the proposed algorithms converge linearly when considering deterministic centralized optimization problems under the P--L condition. We demonstrate through numerical experiments the efficiency of our algorithms on generating adversarial examples from deep neural networks in comparison with baseline and recently proposed centralized and distributed ZO algorithms

    Faster Stochastic Quasi-Newton Methods

    Full text link
    Stochastic optimization methods have become a class of popular optimization tools in machine learning. Especially, stochastic gradient descent (SGD) has been widely used for machine learning problems such as training neural networks due to low per-iteration computational complexity. In fact, the Newton or quasi-newton methods leveraging second-order information are able to achieve a better solution than the first-order methods. Thus, stochastic quasi-Newton (SQN) methods have been developed to achieve the better solution efficiently than the stochastic first-order methods by utilizing approximate second-order information. However, the existing SQN methods still do not reach the best known stochastic first-order oracle (SFO) complexity. To fill this gap, we propose a novel faster stochastic quasi-Newton method (SpiderSQN) based on the variance reduced technique of SIPDER. We prove that our SpiderSQN method reaches the best known SFO complexity of O(n+n1/2ϵ−2)\mathcal{O}(n+n^{1/2}\epsilon^{-2}) in the finite-sum setting to obtain an ϵ\epsilon-first-order stationary point. To further improve its practical performance, we incorporate SpiderSQN with different momentum schemes. Moreover, the proposed algorithms are generalized to the online setting, and the corresponding SFO complexity of O(ϵ−3)\mathcal{O}(\epsilon^{-3}) is developed, which also matches the existing best result. Extensive experiments on benchmark datasets demonstrate that our new algorithms outperform state-of-the-art approaches for nonconvex optimization.Comment: 11 pages, accepted for publication by TNNLS. arXiv admin note: text overlap with arXiv:1902.02715 by other author

    Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

    Full text link
    In the paper, we propose a class of accelerated zeroth-order and first-order momentum methods for both nonconvex mini-optimization and minimax-optimization. Specifically, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method to solve stochastic mini-optimization problems. We prove that the Acc-ZOM method achieves a lower query complexity of O~(d3/4ϵ−3)\tilde{O}(d^{3/4}\epsilon^{-3}) for finding an ϵ\epsilon-stationary point, which improves the best known result by a factor of O(d1/4)O(d^{1/4}) where dd denotes the parameter dimension. In particular, the Acc-ZOM does not require large batches required in the existing zeroth-order stochastic algorithms. At the same time, we propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method for black-box minimax-optimization. We prove that the Acc-ZOMDA method reaches a lower query complexity of O~((d1+d2)9/10κy3ϵ−3)\tilde{O}((d_1+d_2)^{9/10}\kappa_y^{3}\epsilon^{-3}) for finding an ϵ\epsilon-stationary point, which improves the best known result by a factor of O((d1+d2)1/10)O((d_1+d_2)^{1/10}) where d1d_1 and d2d_2 denote dimensions of optimization parameters and κy\kappa_y is condition number. Moreover, we propose an accelerated first-order momentum descent ascent (Acc-MDA) method for solving white-box minimax problems, and prove that it achieves a lower gradient complexity of O~(κy(3−ν/2)ϵ−3)\tilde{O}(\kappa_y^{(3-\nu/2)}\epsilon^{-3}) with ν>0\nu>0 for finding an ϵ\epsilon-stationary point, which improves the best known result by a factor of O(κyν/2)O(\kappa_y^{\nu/2}). Extensive experimental results on the black-box adversarial attack to deep neural networks (DNNs) and poisoning attack demonstrate the efficiency of our algorithms.Comment: 66 pages. In this version, we change the Lyapunov functions for our Acc-ZOMDA and Acc-MDA methods in the convergence analysis. Then our Acc-ZOMDA method obtains a lower query complexity and our Acc-MDA method achieves a lower gradient complexit
    corecore