2 research outputs found

    Screening for Sparse Online Learning

    Full text link
    Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit "finite activity identification", namely, they can identify the low-complexity structure in a finite number of iterations. However, most online algorithms (such as proximal stochastic gradient descent) do not have the property owing to the vanishing step-size and non-vanishing variance. In this paper, by combining with a screening rule, we show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification. One consequence is that when combined with any convergent online algorithm, sparsity properties imposed by the regularizer can be exploited for computational gains. Numerically, significant acceleration can be obtained

    Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets

    Full text link
    In this paper, we propose approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the \textit{linear minimization oracle} (LMO) cannot be efficiently obtained in general. We first demonstrate that two popular approximation assumptions (\textit{additive} and \textit{multiplicative gap errors)}, are not valid for our problem, in that no cheap gap-approximate LMO oracle exists in general. Instead, a new \textit{approximate dual maximization oracle} (DMO) is proposed, which approximates the inner product rather than the gap. When the objective is LL-smooth, we prove that the standard FW method using a δ\delta-approximate DMO converges as O(L/δt+(1δ)(δ1+δ2))\mathcal{O}(L / \delta t + (1-\delta)(\delta^{-1} + \delta^{-2})) in general, and as O(L/(δ2(t+2)))\mathcal{O}(L/(\delta^2(t+2))) over a δ\delta-relaxation of the constraint set. Additionally, when the objective is μ\mu-strongly convex and the solution is unique, a variant of FW converges to O(L2log(t)/(μδ6t2))\mathcal{O}(L^2\log(t)/(\mu \delta^6 t^2)) with the same per-iteration complexity. Our empirical results suggest that even these improved bounds are pessimistic, with significant improvement in recovering real-world images with graph-structured sparsity.Comment: 30 pages, 8 figure
    corecore