2 research outputs found
Screening for Sparse Online Learning
Sparsity promoting regularizers are widely used to impose low-complexity
structure (e.g. l1-norm for sparsity) to the regression coefficients of
supervised learning. In the realm of deterministic optimization, the sequence
generated by iterative algorithms (such as proximal gradient descent) exhibit
"finite activity identification", namely, they can identify the low-complexity
structure in a finite number of iterations. However, most online algorithms
(such as proximal stochastic gradient descent) do not have the property owing
to the vanishing step-size and non-vanishing variance. In this paper, by
combining with a screening rule, we show how to eliminate useless features of
the iterates generated by online algorithms, and thereby enforce finite
activity identification. One consequence is that when combined with any
convergent online algorithm, sparsity properties imposed by the regularizer can
be exploited for computational gains. Numerically, significant acceleration can
be obtained
Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets
In this paper, we propose approximate Frank-Wolfe (FW) algorithms to solve
convex optimization problems over graph-structured support sets where the
\textit{linear minimization oracle} (LMO) cannot be efficiently obtained in
general. We first demonstrate that two popular approximation assumptions
(\textit{additive} and \textit{multiplicative gap errors)}, are not valid for
our problem, in that no cheap gap-approximate LMO oracle exists in general.
Instead, a new \textit{approximate dual maximization oracle} (DMO) is proposed,
which approximates the inner product rather than the gap. When the objective is
-smooth, we prove that the standard FW method using a -approximate
DMO converges as in general, and as over a
-relaxation of the constraint set. Additionally, when the objective is
-strongly convex and the solution is unique, a variant of FW converges to
with the same per-iteration
complexity. Our empirical results suggest that even these improved bounds are
pessimistic, with significant improvement in recovering real-world images with
graph-structured sparsity.Comment: 30 pages, 8 figure