14,578 research outputs found
Conditional gradient type methods for composite nonlinear and stochastic optimization
In this paper, we present a conditional gradient type (CGT) method for
solving a class of composite optimization problems where the objective function
consists of a (weakly) smooth term and a (strongly) convex regularization term.
While including a strongly convex term in the subproblems of the classical
conditional gradient (CG) method improves its rate of convergence, it does not
cost per iteration as much as general proximal type algorithms. More
specifically, we present a unified analysis for the CGT method in the sense
that it achieves the best-known rate of convergence when the weakly smooth term
is nonconvex and possesses (nearly) optimal complexity if it turns out to be
convex. While implementation of the CGT method requires explicitly estimating
problem parameters like the level of smoothness of the first term in the
objective function, we also present a few variants of this method which relax
such estimation. Unlike general proximal type parameter free methods, these
variants of the CGT method do not require any additional effort for computing
(sub)gradients of the objective function and/or solving extra subproblems at
each iteration. We then generalize these methods under stochastic setting and
present a few new complexity results. To the best of our knowledge, this is the
first time that such complexity results are presented for solving stochastic
weakly smooth nonconvex and (strongly) convex optimization problems
Faster Projection-free Convex Optimization over the Spectrahedron
Minimizing a convex function over the spectrahedron, i.e., the set of all
positive semidefinite matrices with unit trace, is an important optimization
task with many applications in optimization, machine learning, and signal
processing. It is also notoriously difficult to solve in large-scale since
standard techniques require expensive matrix decompositions. An alternative, is
the conditional gradient method (aka Frank-Wolfe algorithm) that regained much
interest in recent years, mostly due to its application to this specific
setting. The key benefit of the CG method is that it avoids expensive matrix
decompositions all together, and simply requires a single eigenvector
computation per iteration, which is much more efficient. On the downside, the
CG method, in general, converges with an inferior rate. The error for
minimizing a -smooth function after iterations scales like
. This convergence rate does not improve even if the function is also
strongly convex.
In this work we present a modification of the CG method tailored for convex
optimization over the spectrahedron. The per-iteration complexity of the method
is essentially identical to that of the standard CG method: only a single
eigenvecor computation is required. For minimizing an -strongly convex
and -smooth function, the expected approximation error of the method
after iterations is: where is the optimal solution. To the best of our knowledge,
this is the first result that attains provably faster convergence rates for a
CG variant for optimization over the spectrahedron. We also present encouraging
preliminary empirical results
Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization
We study a hybrid conditional gradient - smoothing algorithm (HCGS) for
solving composite convex optimization problems which contain several terms over
a bounded set. Examples of these include regularization problems with several
norms as penalties and a norm constraint. HCGS extends conditional gradient
methods to cases with multiple nonsmooth terms, in which standard conditional
gradient methods may be difficult to apply. The HCGS algorithm borrows
techniques from smoothing proximal methods and requires first-order
computations (subgradients and proximity operations). Unlike proximal methods,
HCGS benefits from the advantages of conditional gradient methods, which render
it more efficient on certain large scale optimization problems. We demonstrate
these advantages with simulations on two matrix optimization problems:
regularization of matrices with combined and trace norm penalties; and
a convex relaxation of sparse PCA
First-order convex feasibility algorithms for iterative image reconstruction in X-ray CT
Iterative image reconstruction (IIR) algorithms in Computed Tomography (CT)
are based on algorithms for solving a particular optimization problem. Design
of the IIR algorithm, therefore, is aided by knowledge of the solution to the
optimization problem on which it is based. Often times, however, it is
impractical to achieve accurate solution to the optimization of interest, which
complicates design of IIR algorithms. This issue is particularly acute for CT
with a limited angular-range scan, which leads to poorly conditioned system
matrices and difficult to solve optimization problems. In this article, we
develop IIR algorithms which solve a certain type of optimization called convex
feasibility. The convex feasibility approach can provide alternatives to
unconstrained optimization approaches and at the same time allow for efficient
algorithms for their solution -- thereby facilitating the IIR algorithm design
process. An accelerated version of the Chambolle-Pock (CP) algorithm is adapted
to various convex feasibility problems of potential interest to IIR in CT. One
of the proposed problems is seen to be equivalent to least-squares
minimization, and two other problems provide alternatives to penalized,
least-squares minimization.Comment: Revised version to appear March 2013 in Medical Physics. Version 1
has an error in line 5 of pseudocodes in Figs. 2 and 9 (now 12). This has
been corrected in Version
Stochastic Conditional Gradient++
In this paper, we consider the general non-oblivious stochastic optimization
where the underlying stochasticity may change during the optimization procedure
and depends on the point at which the function is evaluated. We develop
Stochastic Frank-Wolfe++ (), an efficient variant of the
conditional gradient method for minimizing a smooth non-convex function subject
to a convex body constraint. We show that converges to an
-first order stationary point by using stochastic
gradients. Once further structures are present, 's theoretical
guarantees, in terms of the convergence rate and quality of its solution,
improve. In particular, for minimizing a convex function,
achieves an -approximate optimum while using
stochastic gradients. It is known that this rate is optimal in terms of
stochastic gradient evaluations. Similarly, for maximizing a monotone
continuous DR-submodular function, a slightly different form of , called Stochastic Continuous Greedy++ (), achieves a tight
solution while using
stochastic gradients. Through an information theoretic argument, we also prove
that 's convergence rate is optimal. Finally, for maximizing a
non-monotone continuous DR-submodular function, we can achieve a
solution by using stochastic
gradients. We should highlight that our results and our novel variance
reduction technique trivially extend to the standard and easier oblivious
stochastic optimization settings for (non-)covex and continuous submodular
settings
Generalized Conditional Gradient for Sparse Estimation
Structured sparsity is an important modeling tool that expands the
applicability of convex formulations for data analysis, however it also creates
significant challenges for efficient algorithm design. In this paper we
investigate the generalized conditional gradient (GCG) algorithm for solving
structured sparse optimization problems---demonstrating that, with some
enhancements, it can provide a more efficient alternative to current state of
the art approaches. After providing a comprehensive overview of the convergence
properties of GCG, we develop efficient methods for evaluating polar operators,
a subroutine that is required in each GCG iteration. In particular, we show how
the polar operator can be efficiently evaluated in two important scenarios:
dictionary learning and structured sparse estimation. A further improvement is
achieved by interleaving GCG with fixed-rank local subspace optimization. A
series of experiments on matrix completion, multi-class classification,
multi-view dictionary learning and overlapping group lasso shows that the
proposed method can significantly reduce the training cost of current
alternatives.Comment: 67 pages, 20 figure
An Inexact Newton-like conditional gradient method for constrained nonlinear systems
In this paper, we propose an inexact Newton-like conditional gradient method
for solving constrained systems of nonlinear equations. The local convergence
of the new method as well as results on its rate are established by using a
general majorant condition. Two applications of such condition are provided:
one is for functions whose the derivative satisfies Holder-like condition and
the other is for functions that satisfies a Smale condition, which includes a
substantial class of analytic functions. Some preliminaries numerical
experiments illustrating the applicability of the proposed method for medium
and large problems are also presented
A unified variance-reduced accelerated gradient method for convex optimization
We propose a novel randomized incremental gradient algorithm, namely,
VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization.
Equipped with a unified step-size policy that adjusts itself to the value of
the condition number, Varag exhibits the unified optimal rates of convergence
for solving smooth convex finite-sum problems directly regardless of their
strong convexity. Moreover, Varag is the first accelerated randomized
incremental gradient method that benefits from the strong convexity of the
data-fidelity term to achieve the optimal linear convergence. It also
establishes an optimal linear rate of convergence for solving a wide class of
problems only satisfying a certain error bound condition rather than strong
convexity. Varag can also be extended to solve stochastic finite-sum problems.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019
Model Function Based Conditional Gradient Method with Armijo-like Line Search
The Conditional Gradient Method is generalized to a class of non-smooth
non-convex optimization problems with many applications in machine learning.
The proposed algorithm iterates by minimizing so-called model functions over
the constraint set. Complemented with an Amijo line search procedure, we prove
that subsequences converge to a stationary point. The abstract framework of
model functions provides great flexibility for the design of concrete
algorithms. As special cases, for example, we develop an algorithm for additive
composite problems and an algorithm for non-linear composite problems which
leads to a Gauss--Newton-type algorithm. Both instances are novel in non-smooth
non-convex optimization and come with numerous applications in machine
learning. Moreover, we obtain a hybrid version of Conditional Gradient and
Proximal Minimization schemes for free, which combines advantages of both. Our
algorithm is shown to perform favorably on a sparse non-linear robust
regression problem and we discuss the flexibility of the proposed framework in
several matrix factorization formulations
A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization
Linear optimization is many times algorithmically simpler than non-linear
convex optimization. Linear optimization over matroid polytopes, matching
polytopes and path polytopes are example of problems for which we have simple
and efficient combinatorial algorithms, but whose non-linear convex counterpart
is harder and admits significantly less efficient algorithms. This motivates
the computational model of convex optimization, including the offline, online
and stochastic settings, using a linear optimization oracle. In this
computational model we give several new results that improve over the previous
state-of-the-art. Our main result is a novel conditional gradient algorithm for
smooth and strongly convex optimization over polyhedral sets that performs only
a single linear optimization step over the domain on each iteration and enjoys
a linear convergence rate. This gives an exponential improvement in convergence
rate over previous results.
Based on this new conditional gradient algorithm we give the first algorithms
for online convex optimization over polyhedral sets that perform only a single
linear optimization step over the domain while having optimal regret
guarantees, answering an open question of Kalai and Vempala, and Hazan and
Kale. Our online algorithms also imply conditional gradient algorithms for
non-smooth and stochastic convex optimization with the same convergence rates
as projected (sub)gradient methods
- …