3,865 research outputs found

    A Projection-Free Algorithm for Solving Support Vector Machine Models

    Get PDF
    In this thesis our goal is to solve the dual problem of the support vector machine (SVM) problem, which is an example of convex smooth optimization problem over a polytope. To this goal, we apply the conditional gradient (CG) method by providing explicit solution to the linear programming (LP) subproblem. We also describe the conditional gradient sliding (CGS) method that can be considered as an improvement of CG in terms of number of gradient evaluations. Even though CGS performs better than CG in terms of optimal complexity bounds, it is not a practical method because it requires the knowledge of the Lipschitz constant and also the number of iterations. As an improvement of CGS, we designed a new method, conditional gradient sliding with line search (CGS-ls) that resolves the issues in CGS method. CGS-ls requires O(1/1/ϵ)O(1/\sqrt{1/\epsilon}) gradient evaluations and O(1/ϵ)O(1/\epsilon) linear optimization calls that achieves the optimal complexity bounds in CGS method. We also compare the performance of our method with CG and CGS methods as numerical results by experimenting them in dual problem of SVM for binary classification of two subsets of the MNIST hand-written digits dataset

    Variance-Reduced and Projection-Free Stochastic Optimization

    Full text link
    The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve 1ϵ1-\epsilon accuracy. For example, we improve from O(1ϵ)O(\frac{1}{\epsilon}) to O(ln1ϵ)O(\ln\frac{1}{\epsilon}) if the objective function is smooth and strongly convex, and from O(1ϵ2)O(\frac{1}{\epsilon^2}) to O(1ϵ1.5)O(\frac{1}{\epsilon^{1.5}}) if the objective function is smooth and Lipschitz. The theoretical improvement is also observed in experiments on real-world datasets for a multiclass classification application

    Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization

    Get PDF
    We propose a new first-order optimisation algorithm to solve high-dimensional non-smooth composite minimisation problems. Typical examples of such problems have an objective that decomposes into a non-smooth empirical risk part and a non-smooth regularisation penalty. The proposed algorithm, called Semi-Proximal Mirror-Prox, leverages the Fenchel-type representation of one part of the objective while handling the other part of the objective via linear minimization over the domain. The algorithm stands in contrast with more classical proximal gradient algorithms with smoothing, which require the computation of proximal operators at each iteration and can therefore be impractical for high-dimensional problems. We establish the theoretical convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal complexity bounds, i.e. O(1/ϵ2)O(1/\epsilon^2), for the number of calls to linear minimization oracle. We present promising experimental results showing the interest of the approach in comparison to competing methods

    Projected gradient descent for non-convex sparse spike estimation

    Full text link
    We propose a new algorithm for sparse spike estimation from Fourier measurements. Based on theoretical results on non-convex optimization techniques for off-the-grid sparse spike estimation, we present a projected gradient descent algorithm coupled with a spectral initialization procedure. Our algorithm permits to estimate the positions of large numbers of Diracs in 2d from random Fourier measurements. We present, along with the algorithm, theoretical qualitative insights explaining the success of our algorithm. This opens a new direction for practical off-the-grid spike estimation with theoretical guarantees in imaging applications
    corecore