54 research outputs found
Variance-Reduced and Projection-Free Stochastic Optimization
The Frank-Wolfe optimization algorithm has recently regained popularity for
machine learning applications due to its projection-free property and its
ability to handle structured constraints. However, in the stochastic learning
setting, it is still relatively understudied compared to the gradient descent
counterpart. In this work, leveraging a recent variance reduction technique, we
propose two stochastic Frank-Wolfe variants which substantially improve
previous results in terms of the number of stochastic gradient evaluations
needed to achieve accuracy. For example, we improve from
to if the objective function
is smooth and strongly convex, and from to
if the objective function is smooth and
Lipschitz. The theoretical improvement is also observed in experiments on
real-world datasets for a multiclass classification application
Stochastic Frank-Wolfe Methods for Nonconvex Optimization
We study Frank-Wolfe methods for nonconvex stochastic and finite-sum
optimization problems. Frank-Wolfe methods (in the convex case) have gained
tremendous recent interest in machine learning and optimization communities due
to their projection-free property and their ability to exploit structured
constraints. However, our understanding of these algorithms in the nonconvex
setting is fairly limited. In this paper, we propose nonconvex stochastic
Frank-Wolfe methods and analyze their convergence properties. For objective
functions that decompose into a finite-sum, we leverage ideas from variance
reduction techniques for convex optimization to obtain new variance reduced
nonconvex Frank-Wolfe methods that have provably faster convergence than the
classical Frank-Wolfe method. Finally, we show that the faster convergence
rates of our variance reduced methods also translate into improved convergence
rates for the stochastic setting
Riemannian Optimization via Frank-Wolfe Methods
We study projection-free methods for constrained Riemannian optimization. In
particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze
non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex
problems, and to a critical point for nonconvex objectives. We also present a
practical setting under which RFW can attain a linear convergence rate. As a
concrete example, we specialize Rfw to the manifold of positive definite
matrices and apply it to two tasks: (i) computing the matrix geometric mean
(Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter.
Both tasks involve geodesically convex interval constraints, for which we show
that the Riemannian "linear oracle" required by RFW admits a closed-form
solution; this result may be of independent interest. We further specialize RFW
to the special orthogonal group and show that here too, the Riemannian "linear
oracle" can be solved in closed form. Here, we describe an application to the
synchronization of data matrices (Procrustes problem). We complement our
theoretical results with an empirical comparison of Rfw against
state-of-the-art Riemannian optimization methods and observe that RFW performs
competitively on the task of computing Riemannian centroids.Comment: Under Review. Largely revised version, including an extended
experimental section and an application to the special orthogonal group and
the Procrustes proble
- …