1,733 research outputs found
Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems
Composite convex optimization problems which include both a nonsmooth term
and a low-rank promoting term have important applications in machine learning
and signal processing, such as when one wishes to recover an unknown matrix
that is simultaneously low-rank and sparse. However, such problems are highly
challenging to solve in large-scale: the low-rank promoting term prohibits
efficient implementations of proximal methods for composite optimization and
even simple subgradient methods. On the other hand, methods which are tailored
for low-rank optimization, such as conditional gradient-type methods, which are
often applied to a smooth approximation of the nonsmooth objective, are slow
since their runtime scales with both the large Lipshitz parameter of the
smoothed gradient vector and with . In this paper we develop
efficient algorithms for \textit{stochastic} optimization of a strongly-convex
objective which includes both a nonsmooth term and a low-rank promoting term.
In particular, to the best of our knowledge, we present the first algorithm
that enjoys all following critical properties for large-scale problems: i)
(nearly) optimal sample complexity, ii) each iteration requires only a single
\textit{low-rank} SVD computation, and iii) overall number of thin-SVD
computations scales only with (as opposed to
in previous methods). We also give an algorithm for
the closely-related finite-sum setting. At the heart of our results lie a novel
combination of a variance-reduction technique and the use of a
\textit{weak-proximal oracle} which is key to obtaining all above three
properties simultaneously
Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity
The use of convex regularizers allows for easy optimization, though they
often produce biased estimation and inferior prediction performance. Recently,
nonconvex regularizers have attracted a lot of attention and outperformed
convex ones. However, the resultant optimization problem is much harder. In
this paper, for a large class of nonconvex regularizers, we propose to move the
nonconvexity from the regularizer to the loss. The nonconvex regularizer is
then transformed to a familiar convex regularizer, while the resultant loss
function can still be guaranteed to be smooth. Learning with the convexified
regularizer can be performed by existing efficient algorithms originally
designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe
algorithm, alternating direction method of multipliers and stochastic gradient
descent). Extensions are made when the convexified regularizer does not have
closed-form proximal step, and when the loss function is nonconvex, nonsmooth.
Extensive experiments on a variety of machine learning application scenarios
show that optimizing the transformed problem is much faster than running the
state-of-the-art on the original problem.Comment: Journal version of previous conference paper appeared at ICML-2016
with same titl
Randomized Smoothing SVRG for Large-scale Nonsmooth Convex Optimization
In this paper, we consider the problem of minimizing the average of a large
number of nonsmooth and convex functions. Such problems often arise in typical
machine learning problems as empirical risk minimization, but are
computationally very challenging. We develop and analyze a new algorithm that
achieves robust linear convergence rate, and both its time complexity and
gradient complexity are superior than state-of-art nonsmooth algorithms and
subgradient-based schemes. Besides, our algorithm works without any extra error
bound conditions on the objective function as well as the common
strongly-convex condition. We show that our algorithm has wide applications in
optimization and machine learning problems, and demonstrate experimentally that
it performs well on a large-scale ranking problem.Comment: 10 pages, 12 figures. arXiv admin note: text overlap with
arXiv:1103.4296, arXiv:1403.4699 by other author
Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations
We study the phase retrieval problem, which solves quadratic system of
equations, i.e., recovers a vector from its
magnitude measurements . We develop a gradient-like algorithm (referred to as RWF
representing reshaped Wirtinger flow) by minimizing a nonconvex nonsmooth loss
function. In comparison with existing nonconvex Wirtinger flow (WF) algorithm
\cite{candes2015phase}, although the loss function becomes nonsmooth, it
involves only the second power of variable and hence reduces the complexity. We
show that for random Gaussian measurements, RWF enjoys geometric convergence to
a global optimal point as long as the number of measurements is on the
order of , the dimension of the unknown . This improves the
sample complexity of WF, and achieves the same sample complexity as truncated
Wirtinger flow (TWF) \cite{chen2015solving}, but without truncation in gradient
loop. Furthermore, RWF costs less computationally than WF, and runs faster
numerically than both WF and TWF. We further develop the incremental
(stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges
linearly to the true signal. We further establish performance guarantee of an
existing Kaczmarz method for the phase retrieval problem based on its
connection to IRWF. We also empirically demonstrate that IRWF outperforms
existing ITWF algorithm (stochastic version of TWF) as well as other batch
algorithms.Comment: Part of this draft is accepted to NIPS 201
The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM
We introduce the Stochastic Asynchronous Proximal Alternating Linearized
Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient
method for solving nonconvex, nonsmooth optimization problems. SAPALM is the
first asynchronous parallel optimization method that provably converges on a
large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the
best known rates of convergence --- among synchronous or asynchronous methods
--- on this problem class. We provide upper bounds on the number of workers for
which we can expect to see a linear speedup, which match the best bounds known
for less complex problems, and show that in practice SAPALM achieves this
linear speedup. We demonstrate state-of-the-art performance on several matrix
factorization problems
Primal-Dual Optimization Algorithms over Riemannian Manifolds: an Iteration Complexity Analysis
In this paper we study nonconvex and nonsmooth multi-block optimization over
Riemannian manifolds with coupled linear constraints. Such optimization
problems naturally arise from machine learning, statistical learning,
compressive sensing, image processing, and tensor PCA, among others. We develop
an ADMM-like primal-dual approach based on decoupled solvable subroutines such
as linearized proximal mappings. First, we introduce the optimality conditions
for the afore-mentioned optimization models. Then, the notion of
-stationary solutions is introduced as a result. The main part of the
paper is to show that the proposed algorithms enjoy an iteration complexity of
to reach an -stationary solution. For prohibitively
large-size tensor or machine learning models, we present a sampling-based
stochastic algorithm with the same iteration complexity bound in expectation.
In case the subproblems are not analytically solvable, a feasible curvilinear
line-search variant of the algorithm based on retraction operators is proposed.
Finally, we show specifically how the algorithms can be implemented to solve a
variety of practical problems such as the NP-hard maximum bisection problem,
the regularized sparse tensor principal component analysis and the
community detection problem. Our preliminary numerical results show great
potentials of the proposed methods
Stochastic Variance-Reduced ADMM
The alternating direction method of multipliers (ADMM) is a powerful
optimization solver in machine learning. Recently, stochastic ADMM has been
integrated with variance reduction methods for stochastic gradient, leading to
SAG-ADMM and SDCA-ADMM that have fast convergence rates and low iteration
complexities. However, their space requirements can still be high. In this
paper, we propose an integration of ADMM with the method of stochastic variance
reduced gradient (SVRG). Unlike another recent integration attempt called
SCAS-ADMM, the proposed algorithm retains the fast convergence benefits of
SAG-ADMM and SDCA-ADMM, but is more advantageous in that its storage
requirement is very low, even independent of the sample size . We also
extend the proposed method for nonconvex problems, and obtain a convergence
rate of . Experimental results demonstrate that it is as fast as
SAG-ADMM and SDCA-ADMM, much faster than SCAS-ADMM, and can be used on much
bigger data sets
Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
With the large rising of complex data, the nonconvex models such as nonconvex
loss function and nonconvex regularizer are widely used in machine learning and
pattern recognition. In this paper, we propose a class of mini-batch stochastic
ADMMs (alternating direction method of multipliers) for solving large-scale
nonconvex nonsmooth problems. We prove that, given an appropriate mini-batch
size, the mini-batch stochastic ADMM without variance reduction (VR) technique
is convergent and reaches a convergence rate of to obtain a stationary
point of the nonconvex optimization, where denotes the number of
iterations. Moreover, we extend the mini-batch stochastic gradient method to
both the nonconvex SVRG-ADMM and SAGA-ADMM proposed in our initial manuscript
\cite{huang2016stochastic}, and prove these mini-batch stochastic ADMMs also
reaches the convergence rate of without condition on the mini-batch
size. In particular, we provide a specific parameter selection for step size
of stochastic gradients and penalty parameter of augmented
Lagrangian function. Finally, extensive experimental results on both simulated
and real-world data demonstrate the effectiveness of the proposed algorithms.Comment: We have fixed some errors in the proofs. arXiv admin note: text
overlap with arXiv:1610.0275
FasTer: Fast Tensor Completion with Nonconvex Regularization
Low-rank tensor completion problem aims to recover a tensor from limited
observations, which has many real-world applications. Due to the easy
optimization, the convex overlapping nuclear norm has been popularly used for
tensor completion. However, it over-penalizes top singular values and lead to
biased estimations. In this paper, we propose to use the nonconvex regularizer,
which can less penalize large singular values, instead of the convex one for
tensor completion. However, as the new regularizer is nonconvex and overlapped
with each other, existing algorithms are either too slow or suffer from the
huge memory cost. To address these issues, we develop an efficient and scalable
algorithm, which is based on the proximal average (PA) algorithm, for
real-world problems. Compared with the direct usage of PA algorithm, the
proposed algorithm runs orders faster and needs orders less space. We further
speed up the proposed algorithm with the acceleration technique, and show the
convergence to critical points is still guaranteed. Experimental comparisons of
the proposed approach are made with various other tensor completion approaches.
Empirical results show that the proposed algorithm is very fast and can produce
much better recovery performance
Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold
We consider optimization problems over the Stiefel manifold whose objective
function is the summation of a smooth function and a nonsmooth function.
Existing methods for solving this kind of problems can be classified into three
classes. Algorithms in the first class rely on information of the subgradients
of the objective function and thus tend to converge slowly in practice.
Algorithms in the second class are proximal point algorithms, which involve
subproblems that can be as difficult as the original problem. Algorithms in the
third class are based on operator-splitting techniques, but they usually lack
rigorous convergence guarantees. In this paper, we propose a retraction-based
proximal gradient method for solving this class of problems. We prove that the
proposed method globally converges to a stationary point. Iteration complexity
for obtaining an -stationary solution is also analyzed. Numerical
results on solving sparse PCA and compressed modes problems are reported to
demonstrate the advantages of the proposed method
- …