4,422 research outputs found
Non-asymptotic Analysis of Stochastic Methods for Non-Smooth Non-Convex Regularized Problems
Stochastic Proximal Gradient (SPG) methods have been widely used for solving
optimization problems with a simple (possibly non-smooth) regularizer in
machine learning and statistics. However, to the best of our knowledge no
non-asymptotic convergence analysis of SPG exists for non-convex optimization
with a non-smooth and non-convex regularizer. All existing non-asymptotic
analysis of SPG for solving non-smooth non-convex problems require the
non-smooth regularizer to be a convex function, and hence are not applicable to
a non-smooth non-convex regularized problem. This work initiates the analysis
to bridge this gap and opens the door to non-asymptotic convergence analysis of
non-smooth non-convex regularized problems. We analyze several variants of
mini-batch SPG methods for minimizing a non-convex objective that consists of a
smooth non-convex loss and a non-smooth non-convex regularizer. Our
contributions are two-fold: (i) we show that they enjoy the same complexities
as their counterparts for solving convex regularized non-convex problems in
terms of finding an approximate stationary point; (ii) we develop more
practical variants using dynamic mini-batch size instead of a fixed mini-batch
size without requiring the target accuracy level of solution. The significance
of our results is that they improve upon the-state-of-art results for solving
non-smooth non-convex regularized problems. We also empirically demonstrate the
effectiveness of the considered SPG methods in comparison with other peer
stochastic methods.Comment: Accepted to NeurIPS 201
Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization
Our work focuses on stochastic gradient methods for optimizing a smooth
non-convex loss function with a non-smooth non-convex regularizer. Research on
this class of problem is quite limited, and until recently no non-asymptotic
convergence results have been reported. We present two simple stochastic
gradient algorithms, for finite-sum and general stochastic optimization
problems, which have superior convergence complexities compared to the current
state-of-the-art. We also compare our algorithms' performance in practice for
empirical risk minimization
Convergence of Stochastic Proximal Gradient Algorithm
We prove novel convergence results for a stochastic proximal gradient
algorithm suitable for solving a large class of convex optimization problems,
where a convex objective function is given by the sum of a smooth and a
possibly non-smooth component. We consider the iterates convergence and derive
non asymptotic bounds in expectation in the strongly convex case, as
well as almost sure convergence results under weaker assumptions. Our approach
allows to avoid averaging and weaken boundedness assumptions which are often
considered in theoretical studies and might not be satisfied in practice.Comment: 24 page
Dual Iterative Hard Thresholding: From Non-convex Sparse Minimization to Non-smooth Concave Maximization
Iterative Hard Thresholding (IHT) is a class of projected gradient descent
methods for optimizing sparsity-constrained minimization models, with the best
known efficiency and scalability in practice. As far as we know, the existing
IHT-style methods are designed for sparse minimization in primal form. It
remains open to explore duality theory and algorithms in such a non-convex and
NP-hard problem setting. In this paper, we bridge this gap by establishing a
duality theory for sparsity-constrained minimization with -regularized
loss function and proposing an IHT-style algorithm for dual maximization. Our
sparse duality theory provides a set of sufficient and necessary conditions
under which the original NP-hard/non-convex problem can be equivalently solved
in a dual formulation. The proposed dual IHT algorithm is a super-gradient
method for maximizing the non-smooth dual objective. An interesting finding is
that the sparse recovery performance of dual IHT is invariant to the Restricted
Isometry Property (RIP), which is required by virtually all the existing primal
IHT algorithms without sparsity relaxation. Moreover, a stochastic variant of
dual IHT is proposed for large-scale stochastic optimization. Numerical results
demonstrate the superiority of dual IHT algorithms to the state-of-the-art
primal IHT-style algorithms in model estimation accuracy and computational
efficiency
A Variable Sample-size Stochastic Quasi-Newton Method for Smooth and Nonsmooth Stochastic Convex Optimization
Classical theory for quasi-Newton schemes has focused on smooth deterministic
unconstrained optimization while recent forays into stochastic convex
optimization have largely resided in smooth, unconstrained, and strongly convex
regimes. Naturally, there is a compelling need to address nonsmoothness, the
lack of strong convexity, and the presence of constraints. Accordingly, this
paper presents a quasi-Newton framework that can process merely convex and
possibly nonsmooth (but smoothable) stochastic convex problems. We propose a
framework that combines iterative smoothing and regularization with a
variance-reduced scheme reliant on using increasing sample-sizes of gradients.
We make the following contributions. (i) We develop a regularized and smoothed
variable sample-size BFGS update (rsL-BFGS) that generates a sequence of
Hessian approximations and can accommodate nonsmooth convex objectives by
utilizing iterative regularization and smoothing. (ii) In strongly convex
regimes with state-dependent noise, the proposed variable sample-size
stochastic quasi-Newton scheme admits a non-asymptotic linear rate of
convergence while the oracle complexity of computing an -solution is
where is the condition number and
. In nonsmooth (but smoothable) regimes, using Moreau smoothing
retains the linear convergence rate while using more general smoothing leads to
a deterioration of the rate to for the resulting
smoothed VS-SQN scheme; (iii) In merely convex but smooth settings, the
regularized VS-SQN scheme rVS-SQN displays a rate of
. When the smoothness requirements are
weakened, the rate for the regularized and smoothed VS-SQN scheme worsens to
. Such statements allow for a state-dependent noise
assumption under a quadratic growth property
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
We investigate the stochastic optimization problem of minimizing population
risk, where the loss defining the risk is assumed to be weakly convex.
Compositions of Lipschitz convex functions with smooth maps are the primary
examples of such losses. We analyze the estimation quality of such nonsmooth
and nonconvex problems by their sample average approximations. Our main results
establish dimension-dependent rates on subgradient estimation in full
generality and dimension-independent rates when the loss is a generalized
linear model. As an application of the developed techniques, we analyze the
nonsmooth landscape of a robust nonlinear regression problem.Comment: 36 page
Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence
Difference of convex (DC) functions cover a broad family of non-convex and
possibly non-smooth and non-differentiable functions, and have wide
applications in machine learning and statistics. Although deterministic
algorithms for DC functions have been extensively studied, stochastic
optimization that is more suitable for learning with big data remains
under-explored. In this paper, we propose new stochastic optimization
algorithms and study their first-order convergence theories for solving a broad
family of DC functions. We improve the existing algorithms and theories of
stochastic optimization for DC functions from both practical and theoretical
perspectives. On the practical side, our algorithm is more user-friendly
without requiring a large mini-batch size and more efficient by saving
unnecessary computations. On the theoretical side, our convergence analysis
does not necessarily require the involved functions to be smooth with Lipschitz
continuous gradient. Instead, the convergence rate of the proposed stochastic
algorithm is automatically adaptive to the H\"{o}lder continuity of the
gradient of one component function. Moreover, we extend the proposed stochastic
algorithms for DC functions to solve problems with a general non-convex
non-differentiable regularizer, which does not necessarily have a DC
decomposition but enjoys an efficient proximal mapping. To the best of our
knowledge, this is the first work that gives the first non-asymptotic
convergence for solving non-convex optimization whose objective has a general
non-convex non-differentiable regularizer.Comment: In the revised version, we present some improved complexity results
for non-smooth and non-convex regularizers and for functions with known
H\"{o}lder continuity parameter by a simple change of an
algorithmic paramete
NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization
Accelerated gradient (AG) methods are breakthroughs in convex optimization,
improving the convergence rate of the gradient descent method for optimization
with smooth functions. However, the analysis of AG methods for non-convex
optimization is still limited. It remains an open question whether AG methods
from convex optimization can accelerate the convergence of the gradient descent
method for finding local minimum of non-convex optimization problems. This
paper provides an affirmative answer to this question. In particular, we
analyze two renowned variants of AG methods (namely Polyak's Heavy Ball method
and Nesterov's Accelerated Gradient method) for extracting the negative
curvature from random noise, which is central to escaping from saddle points.
By leveraging the proposed AG methods for extracting the negative curvature, we
present a new AG algorithm with double loops for non-convex
optimization~\footnote{this is in contrast to a single-loop AG algorithm
proposed in a recent manuscript~\citep{AGNON}, which directly analyzed the
Nesterov's AG method for non-convex optimization and appeared online on
November 29, 2017. However, we emphasize that our work is an independent work,
which is inspired by our earlier work~\citep{NEON17} and is based on a
different novel analysis.}, which converges to second-order stationary point
\x such that \|\nabla f(\x)\|\leq \epsilon and \nabla^2 f(\x)\geq
-\sqrt{\epsilon} I with iteration
complexity, improving that of gradient descent method by a factor of
and matching the best iteration complexity of second-order
Hessian-free methods for non-convex optimization.Comment: The main result is merged into our manuscript "First-order Stochastic
Algorithms for Escaping From Saddle Points in Almost Linear Time"
(arXiv:1711.01944
An Iterative Regularized Incremental Projected Subgradient Method for a Class of Bilevel Optimization Problems
We study a class of bilevel convex optimization problems where the goal is to
find the minimizer of an objective function in the upper level, among the set
of all optimal solutions of an optimization problem in the lower level. A wide
range of problems in convex optimization can be formulated using this class. An
important example is the case where an optimization problem is ill-posed. In
this paper, our interest lies in addressing the bilevel problems, where the
lower level objective is given as a finite sum of separate nondifferentiable
convex component functions. This is the case in a variety of applications in
distributed optimization, such as large-scale data processing in machine
learning and neural networks. To the best of our knowledge, this class of
bilevel problems, with a finite sum in the lower level, has not been addressed
before. Motivated by this gap, we develop an iterative regularized incremental
subgradient method, where the agents update their iterates in a cyclic manner
using a regularized subgradient. Under a suitable choice of the regularization
parameter sequence, we establish the convergence of the proposed algorithm and
derive a rate of in terms of
the lower level objective function for an arbitrary small . We
present the performance of the algorithm on a binary text classification
problem.Comment: 8 pages, 1 figur
Stochastic Proximal Methods for Non-Smooth Non-Convex Constrained Sparse Optimization
This paper focuses on stochastic proximal gradient methods for optimizing a
smooth non-convex loss function with a non-smooth non-convex regularizer and
convex constraints. To the best of our knowledge we present the first
non-asymptotic convergence results for this class of problem. We present two
simple stochastic proximal gradient algorithms, for general stochastic and
finite-sum optimization problems, which have the same or superior convergence
complexities compared to the current best results for the unconstrained problem
setting. In a numerical experiment we compare our algorithms with the current
state-of-the-art deterministic algorithm and find our algorithms to exhibit
superior convergence.Comment: arXiv admin note: text overlap with arXiv:1901.0836
- …