14,578 research outputs found

    Conditional gradient type methods for composite nonlinear and stochastic optimization

    Full text link
    In this paper, we present a conditional gradient type (CGT) method for solving a class of composite optimization problems where the objective function consists of a (weakly) smooth term and a (strongly) convex regularization term. While including a strongly convex term in the subproblems of the classical conditional gradient (CG) method improves its rate of convergence, it does not cost per iteration as much as general proximal type algorithms. More specifically, we present a unified analysis for the CGT method in the sense that it achieves the best-known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex. While implementation of the CGT method requires explicitly estimating problem parameters like the level of smoothness of the first term in the objective function, we also present a few variants of this method which relax such estimation. Unlike general proximal type parameter free methods, these variants of the CGT method do not require any additional effort for computing (sub)gradients of the objective function and/or solving extra subproblems at each iteration. We then generalize these methods under stochastic setting and present a few new complexity results. To the best of our knowledge, this is the first time that such complexity results are presented for solving stochastic weakly smooth nonconvex and (strongly) convex optimization problems

    Faster Projection-free Convex Optimization over the Spectrahedron

    Full text link
    Minimizing a convex function over the spectrahedron, i.e., the set of all positive semidefinite matrices with unit trace, is an important optimization task with many applications in optimization, machine learning, and signal processing. It is also notoriously difficult to solve in large-scale since standard techniques require expensive matrix decompositions. An alternative, is the conditional gradient method (aka Frank-Wolfe algorithm) that regained much interest in recent years, mostly due to its application to this specific setting. The key benefit of the CG method is that it avoids expensive matrix decompositions all together, and simply requires a single eigenvector computation per iteration, which is much more efficient. On the downside, the CG method, in general, converges with an inferior rate. The error for minimizing a β\beta-smooth function after tt iterations scales like β/t\beta/t. This convergence rate does not improve even if the function is also strongly convex. In this work we present a modification of the CG method tailored for convex optimization over the spectrahedron. The per-iteration complexity of the method is essentially identical to that of the standard CG method: only a single eigenvecor computation is required. For minimizing an α\alpha-strongly convex and β\beta-smooth function, the expected approximation error of the method after tt iterations is: O(min{βt,(βrank(X)α1/4t)4/3,(βαλmin(X)t)2}),O\left({\min\{\frac{\beta{}}{t} ,\left({\frac{\beta\sqrt{\textrm{rank}(\textbf{X}^*)}}{\alpha^{1/4}t}}\right)^{4/3}, \left({\frac{\beta}{\sqrt{\alpha}\lambda_{\min}(\textbf{X}^*)t}}\right)^{2}\}}\right) , where X\textbf{X}^* is the optimal solution. To the best of our knowledge, this is the first result that attains provably faster convergence rates for a CG variant for optimization over the spectrahedron. We also present encouraging preliminary empirical results

    Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization

    Full text link
    We study a hybrid conditional gradient - smoothing algorithm (HCGS) for solving composite convex optimization problems which contain several terms over a bounded set. Examples of these include regularization problems with several norms as penalties and a norm constraint. HCGS extends conditional gradient methods to cases with multiple nonsmooth terms, in which standard conditional gradient methods may be difficult to apply. The HCGS algorithm borrows techniques from smoothing proximal methods and requires first-order computations (subgradients and proximity operations). Unlike proximal methods, HCGS benefits from the advantages of conditional gradient methods, which render it more efficient on certain large scale optimization problems. We demonstrate these advantages with simulations on two matrix optimization problems: regularization of matrices with combined 1\ell_1 and trace norm penalties; and a convex relaxation of sparse PCA

    First-order convex feasibility algorithms for iterative image reconstruction in X-ray CT

    Full text link
    Iterative image reconstruction (IIR) algorithms in Computed Tomography (CT) are based on algorithms for solving a particular optimization problem. Design of the IIR algorithm, therefore, is aided by knowledge of the solution to the optimization problem on which it is based. Often times, however, it is impractical to achieve accurate solution to the optimization of interest, which complicates design of IIR algorithms. This issue is particularly acute for CT with a limited angular-range scan, which leads to poorly conditioned system matrices and difficult to solve optimization problems. In this article, we develop IIR algorithms which solve a certain type of optimization called convex feasibility. The convex feasibility approach can provide alternatives to unconstrained optimization approaches and at the same time allow for efficient algorithms for their solution -- thereby facilitating the IIR algorithm design process. An accelerated version of the Chambolle-Pock (CP) algorithm is adapted to various convex feasibility problems of potential interest to IIR in CT. One of the proposed problems is seen to be equivalent to least-squares minimization, and two other problems provide alternatives to penalized, least-squares minimization.Comment: Revised version to appear March 2013 in Medical Physics. Version 1 has an error in line 5 of pseudocodes in Figs. 2 and 9 (now 12). This has been corrected in Version

    Stochastic Conditional Gradient++

    Full text link
    In this paper, we consider the general non-oblivious stochastic optimization where the underlying stochasticity may change during the optimization procedure and depends on the point at which the function is evaluated. We develop Stochastic Frank-Wolfe++ (SFW++\text{SFW}{++} ), an efficient variant of the conditional gradient method for minimizing a smooth non-convex function subject to a convex body constraint. We show that SFW++\text{SFW}{++} converges to an ϵ\epsilon-first order stationary point by using O(1/ϵ3)O(1/\epsilon^3) stochastic gradients. Once further structures are present, SFW++\text{SFW}{++}'s theoretical guarantees, in terms of the convergence rate and quality of its solution, improve. In particular, for minimizing a convex function, SFW++\text{SFW}{++} achieves an ϵ\epsilon-approximate optimum while using O(1/ϵ2)O(1/\epsilon^2) stochastic gradients. It is known that this rate is optimal in terms of stochastic gradient evaluations. Similarly, for maximizing a monotone continuous DR-submodular function, a slightly different form of SFW++\text{SFW}{++} , called Stochastic Continuous Greedy++ (SCG++\text{SCG}{++} ), achieves a tight [(11/e)OPTϵ][(1-1/e)\text{OPT} -\epsilon] solution while using O(1/ϵ2)O(1/\epsilon^2) stochastic gradients. Through an information theoretic argument, we also prove that SCG++\text{SCG}{++} 's convergence rate is optimal. Finally, for maximizing a non-monotone continuous DR-submodular function, we can achieve a [(1/e)OPTϵ][(1/e)\text{OPT} -\epsilon] solution by using O(1/ϵ2)O(1/\epsilon^2) stochastic gradients. We should highlight that our results and our novel variance reduction technique trivially extend to the standard and easier oblivious stochastic optimization settings for (non-)covex and continuous submodular settings

    Generalized Conditional Gradient for Sparse Estimation

    Full text link
    Structured sparsity is an important modeling tool that expands the applicability of convex formulations for data analysis, however it also creates significant challenges for efficient algorithm design. In this paper we investigate the generalized conditional gradient (GCG) algorithm for solving structured sparse optimization problems---demonstrating that, with some enhancements, it can provide a more efficient alternative to current state of the art approaches. After providing a comprehensive overview of the convergence properties of GCG, we develop efficient methods for evaluating polar operators, a subroutine that is required in each GCG iteration. In particular, we show how the polar operator can be efficiently evaluated in two important scenarios: dictionary learning and structured sparse estimation. A further improvement is achieved by interleaving GCG with fixed-rank local subspace optimization. A series of experiments on matrix completion, multi-class classification, multi-view dictionary learning and overlapping group lasso shows that the proposed method can significantly reduce the training cost of current alternatives.Comment: 67 pages, 20 figure

    An Inexact Newton-like conditional gradient method for constrained nonlinear systems

    Full text link
    In this paper, we propose an inexact Newton-like conditional gradient method for solving constrained systems of nonlinear equations. The local convergence of the new method as well as results on its rate are established by using a general majorant condition. Two applications of such condition are provided: one is for functions whose the derivative satisfies Holder-like condition and the other is for functions that satisfies a Smale condition, which includes a substantial class of analytic functions. Some preliminaries numerical experiments illustrating the applicability of the proposed method for medium and large problems are also presented

    A unified variance-reduced accelerated gradient method for convex optimization

    Full text link
    We propose a novel randomized incremental gradient algorithm, namely, VAriance-Reduced Accelerated Gradient (Varag), for finite-sum optimization. Equipped with a unified step-size policy that adjusts itself to the value of the condition number, Varag exhibits the unified optimal rates of convergence for solving smooth convex finite-sum problems directly regardless of their strong convexity. Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence. It also establishes an optimal linear rate of convergence for solving a wide class of problems only satisfying a certain error bound condition rather than strong convexity. Varag can also be extended to solve stochastic finite-sum problems.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019

    Model Function Based Conditional Gradient Method with Armijo-like Line Search

    Full text link
    The Conditional Gradient Method is generalized to a class of non-smooth non-convex optimization problems with many applications in machine learning. The proposed algorithm iterates by minimizing so-called model functions over the constraint set. Complemented with an Amijo line search procedure, we prove that subsequences converge to a stationary point. The abstract framework of model functions provides great flexibility for the design of concrete algorithms. As special cases, for example, we develop an algorithm for additive composite problems and an algorithm for non-linear composite problems which leads to a Gauss--Newton-type algorithm. Both instances are novel in non-smooth non-convex optimization and come with numerous applications in machine learning. Moreover, we obtain a hybrid version of Conditional Gradient and Proximal Minimization schemes for free, which combines advantages of both. Our algorithm is shown to perform favorably on a sparse non-linear robust regression problem and we discuss the flexibility of the proposed framework in several matrix factorization formulations

    A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization

    Full text link
    Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous state-of-the-art. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for non-smooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods
    corecore