54 research outputs found

    Variance-Reduced and Projection-Free Stochastic Optimization

    Full text link
    The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve 1ϵ1-\epsilon accuracy. For example, we improve from O(1ϵ)O(\frac{1}{\epsilon}) to O(ln1ϵ)O(\ln\frac{1}{\epsilon}) if the objective function is smooth and strongly convex, and from O(1ϵ2)O(\frac{1}{\epsilon^2}) to O(1ϵ1.5)O(\frac{1}{\epsilon^{1.5}}) if the objective function is smooth and Lipschitz. The theoretical improvement is also observed in experiments on real-world datasets for a multiclass classification application

    Stochastic Frank-Wolfe Methods for Nonconvex Optimization

    Full text link
    We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization communities due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limited. In this paper, we propose nonconvex stochastic Frank-Wolfe methods and analyze their convergence properties. For objective functions that decompose into a finite-sum, we leverage ideas from variance reduction techniques for convex optimization to obtain new variance reduced nonconvex Frank-Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method. Finally, we show that the faster convergence rates of our variance reduced methods also translate into improved convergence rates for the stochastic setting

    Riemannian Optimization via Frank-Wolfe Methods

    Full text link
    We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete example, we specialize Rfw to the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian "linear oracle" required by RFW admits a closed-form solution; this result may be of independent interest. We further specialize RFW to the special orthogonal group and show that here too, the Riemannian "linear oracle" can be solved in closed form. Here, we describe an application to the synchronization of data matrices (Procrustes problem). We complement our theoretical results with an empirical comparison of Rfw against state-of-the-art Riemannian optimization methods and observe that RFW performs competitively on the task of computing Riemannian centroids.Comment: Under Review. Largely revised version, including an extended experimental section and an application to the special orthogonal group and the Procrustes proble
    corecore