5,964 research outputs found
Stochastic Optimization of PCA with Capped MSG
We study PCA as a stochastic optimization problem and propose a novel
stochastic approximation algorithm which we refer to as "Matrix Stochastic
Gradient" (MSG), as well as a practical variant, Capped MSG. We study the
method both theoretically and empirically
Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks
We study diffusion and consensus based optimization of a sum of unknown
convex objective functions over distributed networks. The only access to these
functions is through stochastic gradient oracles, each of which is only
available at a different node, and a limited number of gradient oracle calls is
allowed at each node. In this framework, we introduce a convex optimization
algorithm based on the stochastic gradient descent (SGD) updates. Particularly,
we use a carefully designed time-dependent weighted averaging of the SGD
iterates, which yields a convergence rate of
after gradient updates for each node on
a network of nodes. We then show that after gradient oracle calls, the
average SGD iterate achieves a mean square deviation (MSD) of
. This rate of convergence is optimal as it
matches the performance lower bound up to constant terms. Similar to the SGD
algorithm, the computational complexity of the proposed algorithm also scales
linearly with the dimensionality of the data. Furthermore, the communication
load of the proposed method is the same as the communication load of the SGD
algorithm. Thus, the proposed algorithm is highly efficient in terms of
complexity and communication load. We illustrate the merits of the algorithm
with respect to the state-of-art methods over benchmark real life data sets and
widely studied network topologies
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs
We propose a randomized block-coordinate variant of the classic Frank-Wolfe
algorithm for convex optimization with block-separable constraints. Despite its
lower iteration cost, we show that it achieves a similar convergence rate in
duality gap as the full Frank-Wolfe algorithm. We also show that, when applied
to the dual structural support vector machine (SVM) objective, this yields an
online algorithm that has the same low iteration complexity as primal
stochastic subgradient methods. However, unlike stochastic subgradient methods,
the block-coordinate Frank-Wolfe algorithm allows us to compute the optimal
step-size and yields a computable duality gap guarantee. Our experiments
indicate that this simple algorithm outperforms competing structural SVM
solvers.Comment: Appears in Proceedings of the 30th International Conference on
Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes
from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap
proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added
weighted averaging experiments + convergence results; 4) Clarified main text
and relationship with appendi
- …