Search CORE

1,733 research outputs found

Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems

Author: Garber Dan
Kaplan Atara
Publication venue
Publication date: 27/09/2018
Field of study

Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implementations of proximal methods for composite optimization and even simple subgradient methods. On the other hand, methods which are tailored for low-rank optimization, such as conditional gradient-type methods, which are often applied to a smooth approximation of the nonsmooth objective, are slow since their runtime scales with both the large Lipshitz parameter of the smoothed gradient vector and with

1/\epsilon

. In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term. In particular, to the best of our knowledge, we present the first algorithm that enjoys all following critical properties for large-scale problems: i) (nearly) optimal sample complexity, ii) each iteration requires only a single \textit{low-rank} SVD computation, and iii) overall number of thin-SVD computations scales only with

\log{1/\epsilon}

(as opposed to

\textrm{poly}(1/\epsilon)

in previous methods). We also give an algorithm for the closely-related finite-sum setting. At the heart of our results lie a novel combination of a variance-reduction technique and the use of a \textit{weak-proximal oracle} which is key to obtaining all above three properties simultaneously

arXiv.org e-Print Archive

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

Author: Kwok James. T
Yao Quanming
Publication venue
Publication date: 12/02/2017
Field of study

The use of convex regularizers allows for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, for a large class of nonconvex regularizers, we propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe algorithm, alternating direction method of multipliers and stochastic gradient descent). Extensions are made when the convexified regularizer does not have closed-form proximal step, and when the loss function is nonconvex, nonsmooth. Extensive experiments on a variety of machine learning application scenarios show that optimizing the transformed problem is much faster than running the state-of-the-art on the original problem.Comment: Journal version of previous conference paper appeared at ICML-2016 with same titl

arXiv.org e-Print Archive

Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization

Author: Huang Wenjie
Publication venue
Publication date: 11/05/2018
Field of study

In this paper, we consider the problem of minimizing the average of a large number of nonsmooth and convex functions. Such problems often arise in typical machine learning problems as empirical risk minimization, but are computationally very challenging. We develop and analyze a new algorithm that achieves robust linear convergence rate, and both its time complexity and gradient complexity are superior than state-of-art nonsmooth algorithms and subgradient-based schemes. Besides, our algorithm works without any extra error bound conditions on the objective function as well as the common strongly-convex condition. We show that our algorithm has wide applications in optimization and machine learning problems, and demonstrate experimentally that it performs well on a large-scale ranking problem.Comment: 10 pages, 12 figures. arXiv admin note: text overlap with arXiv:1103.4296, arXiv:1403.4699 by other author

arXiv.org e-Print Archive

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

Author: Chi Yuejie
Liang Yingbin
Zhang Huishuai
Zhou Yi
Publication venue
Publication date: 26/10/2016
Field of study

We study the phase retrieval problem, which solves quadratic system of equations, i.e., recovers a vector

\boldsymbol{x}\in \mathbb{R}^n

from its magnitude measurements

y_i=|\langle \boldsymbol{a}_i, \boldsymbol{x}\rangle|, i=1,..., m

. We develop a gradient-like algorithm (referred to as RWF representing reshaped Wirtinger flow) by minimizing a nonconvex nonsmooth loss function. In comparison with existing nonconvex Wirtinger flow (WF) algorithm \cite{candes2015phase}, although the loss function becomes nonsmooth, it involves only the second power of variable and hence reduces the complexity. We show that for random Gaussian measurements, RWF enjoys geometric convergence to a global optimal point as long as the number

m

of measurements is on the order of

n

, the dimension of the unknown

\boldsymbol{x}

. This improves the sample complexity of WF, and achieves the same sample complexity as truncated Wirtinger flow (TWF) \cite{chen2015solving}, but without truncation in gradient loop. Furthermore, RWF costs less computationally than WF, and runs faster numerically than both WF and TWF. We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal. We further establish performance guarantee of an existing Kaczmarz method for the phase retrieval problem based on its connection to IRWF. We also empirically demonstrate that IRWF outperforms existing ITWF algorithm (stochastic version of TWF) as well as other batch algorithms.Comment: Part of this draft is accepted to NIPS 201

arXiv.org e-Print Archive

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

Author: Davis Damek
Edmunds Brent
Udell Madeleine
Publication venue
Publication date: 07/06/2016
Field of study

We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix factorization problems

arXiv.org e-Print Archive

Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis

Author: Ma Shiqian
Zhang Junyu
Zhang Shuzhong
Publication venue
Publication date: 05/10/2017
Field of study

In this paper we study nonconvex and nonsmooth multi-block optimization over Riemannian manifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. We develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of

\epsilon

-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms enjoy an iteration complexity of

O(1/\epsilon^2)

to reach an

\epsilon

-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the

\ell_q

regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods

arXiv.org e-Print Archive

Stochastic Variance-Reduced ADMM

Author: Kwok James T.
Zheng Shuai
Publication venue
Publication date: 16/10/2016
Field of study

The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning. Recently, stochastic ADMM has been integrated with variance reduction methods for stochastic gradient, leading to SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration complexities. However, their space requirements can still be high. In this paper, we propose an integration of ADMM with the method of stochastic variance reduced gradient (SVRG). Unlike another recent integration attempt called SCAS-ADMM, the proposed algorithm retains the fast convergence benefits of SAG-ADMM and SDCA-ADMM, but is more advantageous in that its storage requirement is very low, even independent of the sample size

n

. We also extend the proposed method for nonconvex problems, and obtain a convergence rate of

O(1/T)

. Experimental results demonstrate that it is as fast as SAG-ADMM and SDCA-ADMM, much faster than SCAS-ADMM, and can be used on much bigger data sets

arXiv.org e-Print Archive

Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization

Author: Chen Songcan
Huang Feihu
Publication venue
Publication date: 23/06/2019
Field of study

With the large rising of complex data, the nonconvex models such as nonconvex loss function and nonconvex regularizer are widely used in machine learning and pattern recognition. In this paper, we propose a class of mini-batch stochastic ADMMs (alternating direction method of multipliers) for solving large-scale nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches a convergence rate of

O(1/T)

to obtain a stationary point of the nonconvex optimization, where

T

denotes the number of iterations. Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript \cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also reaches the convergence rate of

O(1/T)

without condition on the mini-batch size. In particular, we provide a specific parameter selection for step size

\eta

of stochastic gradients and penalty parameter

\rho

of augmented Lagrangian function. Finally, extensive experimental results on both simulated and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text overlap with arXiv:1610.0275

arXiv.org e-Print Archive

FasTer: Fast Tensor Completion with Nonconvex Regularization

Author: Han Bo
Kwok James T
Yao Quanming
Publication venue
Publication date: 23/01/2019
Field of study

Low-rank tensor completion problem aims to recover a tensor from limited observations, which has many real-world applications. Due to the easy optimization, the convex overlapping nuclear norm has been popularly used for tensor completion. However, it over-penalizes top singular values and lead to biased estimations. In this paper, we propose to use the nonconvex regularizer, which can less penalize large singular values, instead of the convex one for tensor completion. However, as the new regularizer is nonconvex and overlapped with each other, existing algorithms are either too slow or suffer from the huge memory cost. To address these issues, we develop an efficient and scalable algorithm, which is based on the proximal average (PA) algorithm, for real-world problems. Compared with the direct usage of PA algorithm, the proposed algorithm runs orders faster and needs orders less space. We further speed up the proposed algorithm with the acceleration technique, and show the convergence to critical points is still guaranteed. Experimental comparisons of the proposed approach are made with various other tensor completion approaches. Empirical results show that the proposed algorithm is very fast and can produce much better recovery performance

arXiv.org e-Print Archive

Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold

Author: Chen Shixiang
Ma Shiqian
So Anthony Man-Cho
Zhang Tong
Publication venue
Publication date: 10/05/2019
Field of study

We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this kind of problems can be classified into three classes. Algorithms in the first class rely on information of the subgradients of the objective function and thus tend to converge slowly in practice. Algorithms in the second class are proximal point algorithms, which involve subproblems that can be as difficult as the original problem. Algorithms in the third class are based on operator-splitting techniques, but they usually lack rigorous convergence guarantees. In this paper, we propose a retraction-based proximal gradient method for solving this class of problems. We prove that the proposed method globally converges to a stationary point. Iteration complexity for obtaining an

\epsilon

-stationary solution is also analyzed. Numerical results on solving sparse PCA and compressed modes problems are reported to demonstrate the advantages of the proposed method

arXiv.org e-Print Archive