Search CORE

1,973 research outputs found

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

Author: Poczos Barnabas
Reddi Sashank J.
Smola Alex
Sra Suvrit
Publication venue
Publication date: 23/05/2016
Field of study

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle this issue, we develop fast stochastic algorithms that provably converge to a stationary point for constant minibatches. Furthermore, using a variant of these algorithms, we show provably faster convergence than batch proximal gradient descent. Finally, we prove global linear convergence rate for an interesting subclass of nonsmooth nonconvex functions, that subsumes several recent works. This paper builds upon our recent series of papers on fast stochastic methods for smooth nonconvex optimization [22, 23], with a novel analysis for nonconvex and nonsmooth functions

arXiv.org e-Print Archive

Algorithms for solving optimization problems arising from deep neural net models: nonsmooth problems

Author: Kungurtsev Vyacheslav
Pevny Tomas
Publication venue
Publication date: 30/06/2018
Field of study

Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the problem. However, in addition, there are a number of interesting problems for which the objective function is non- smooth and nonseparable. In this paper, we summarize the primary challenges involved, the state of the art, and present some numerical results on an interesting and representative class of problems

arXiv.org e-Print Archive

Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization

Author: Chen Songcan
Huang Feihu
Publication venue
Publication date: 23/06/2019
Field of study

With the large rising of complex data, the nonconvex models such as nonconvex loss function and nonconvex regularizer are widely used in machine learning and pattern recognition. In this paper, we propose a class of mini-batch stochastic ADMMs (alternating direction method of multipliers) for solving large-scale nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches a convergence rate of

O(1/T)

to obtain a stationary point of the nonconvex optimization, where

T

denotes the number of iterations. Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript \cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also reaches the convergence rate of

O(1/T)

without condition on the mini-batch size. In particular, we provide a specific parameter selection for step size

\eta

of stochastic gradients and penalty parameter

\rho

of augmented Lagrangian function. Finally, extensive experimental results on both simulated and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text overlap with arXiv:1610.0275

arXiv.org e-Print Archive

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

Author: Davis Damek
Edmunds Brent
Udell Madeleine
Publication venue
Publication date: 07/06/2016
Field of study

We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix factorization problems

arXiv.org e-Print Archive

Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization

Author: Li Qunwei
Liang Yingbin
Varshney Pramod K.
Zhou Yi
Publication venue
Publication date: 14/05/2017
Field of study

In many modern machine learning applications, structures of underlying mathematical models often yield nonconvex optimization problems. Due to the intractability of nonconvexity, there is a rising need to develop efficient methods for solving general nonconvex problems with certain performance guarantee. In this work, we investigate the accelerated proximal gradient method for nonconvex programming (APGnc). The method compares between a usual proximal gradient step and a linear extrapolation step, and accepts the one that has a lower function value to achieve a monotonic decrease. In specific, under a general nonsmooth and nonconvex setting, we provide a rigorous argument to show that the limit points of the sequence generated by APGnc are critical points of the objective function. Then, by exploiting the Kurdyka-{\L}ojasiewicz (\KL) property for a broad class of functions, we establish the linear and sub-linear convergence rates of the function value sequence generated by APGnc. We further propose a stochastic variance reduced APGnc (SVRG-APGnc), and establish its linear convergence under a special case of the \KL property. We also extend the analysis to the inexact version of these methods and develop an adaptive momentum strategy that improves the numerical performance.Comment: Accepted in ICML 2017, 9 papes, 4 figure

arXiv.org e-Print Archive

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

Author: Kwok James. T
Yao Quanming
Publication venue
Publication date: 12/02/2017
Field of study

The use of convex regularizers allows for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, for a large class of nonconvex regularizers, we propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe algorithm, alternating direction method of multipliers and stochastic gradient descent). Extensions are made when the convexified regularizer does not have closed-form proximal step, and when the loss function is nonconvex, nonsmooth. Extensive experiments on a variety of machine learning application scenarios show that optimizing the transformed problem is much faster than running the state-of-the-art on the original problem.Comment: Journal version of previous conference paper appeared at ICML-2016 with same titl

arXiv.org e-Print Archive

Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization

Author: Chen Songcan
Gao Shangqian
Huang Feihu
Huang Heng
Publication venue
Publication date: 29/07/2019
Field of study

Alternating direction method of multipliers (ADMM) is a popular optimization tool for the composite and constrained problems in machine learning. However, in many machine learning problems such as black-box attacks and bandit feedback, ADMM could fail because the explicit gradients of these problems are difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can effectively solve these problems due to that the objective function values are only required in the optimization. Recently, though there exist a few zeroth-order ADMM methods, they build on the convexity of objective function. Clearly, these existing zeroth-order methods are limited in many applications. In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems with multiple nonsmooth penalties, based on the coordinate smoothing gradient estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have convergence rate of

O(1/T)

, where

T

denotes the number of iterations. In particular, our methods not only reach the best convergence rate

O(1/T)

for the nonconvex optimization, but also are able to effectively solve many complex machine learning problems with multiple regularized penalties and constraints. Finally, we conduct the experiments of black-box binary classification and structured adversarial attack on black-box deep neural network to validate the efficiency of our algorithms.Comment: To Appear in IJCAI 2019. Supplementary materials are adde

arXiv.org e-Print Archive

Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis

Author: Ma Shiqian
Zhang Junyu
Zhang Shuzhong
Publication venue
Publication date: 05/10/2017
Field of study

In this paper we study nonconvex and nonsmooth multi-block optimization over Riemannian manifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. We develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of

\epsilon

-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms enjoy an iteration complexity of

O(1/\epsilon^2)

to reach an

\epsilon

-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the

\ell_q

regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods

arXiv.org e-Print Archive

A Proximal Zeroth-Order Algorithm for Nonconvex Nonsmooth Problems

Author: Kazemi Ehsan
Wang Liqiang
Publication venue
Publication date: 17/10/2018
Field of study

In this paper, we focus on solving an important class of nonconvex optimization problems which includes many problems for example signal processing over a networked multi-agent system and distributed learning over networks. Motivated by many applications in which the local objective function is the sum of smooth but possibly nonconvex part, and non-smooth but convex part subject to a linear equality constraint, this paper proposes a proximal zeroth-order primal dual algorithm (PZO-PDA) that accounts for the information structure of the problem. This algorithm only utilize the zeroth-order information (i.e., the functional values) of smooth functions, yet the flexibility is achieved for applications that only noisy information of the objective function is accessible, where classical methods cannot be applied. We prove convergence and rate of convergence for PZO-PDA. Numerical experiments are provided to validate the theoretical results

arXiv.org e-Print Archive

Generalized Uniformly Optimal Methods for Nonlinear Programming

Author: Ghadimi Saeed
Lan Guanghui
Zhang Hongchao
Publication venue
Publication date: 12/09/2015
Field of study

In this paper, we present a generic framework to extend existing uniformly optimal convex programming algorithms to solve more general nonlinear, possibly nonconvex, optimization problems. The basic idea is to incorporate a local search step (gradient descent or Quasi-Newton iteration) into these uniformly optimal convex programming methods, and then enforce a monotone decreasing property of the function values computed along the trajectory. Algorithms of these types will then achieve the best known complexity for nonconvex problems, and the optimal complexity for convex ones without requiring any problem parameters. As a consequence, we can have a unified treatment for a general class of nonlinear programming problems regardless of their convexity and smoothness level. In particular, we show that the accelerated gradient and level methods, both originally designed for solving convex optimization problems only, can be used for solving both convex and nonconvex problems uniformly. In a similar vein, we show that some well-studied techniques for nonlinear programming, e.g., Quasi-Newton iteration, can be embedded into optimal convex optimization algorithms to possibly further enhance their numerical performance. Our theoretical and algorithmic developments are complemented by some promising numerical results obtained for solving a few important nonconvex and nonlinear data analysis problems in the literature

arXiv.org e-Print Archive