1,469 research outputs found

    Stochastic Dual Coordinate Ascent with Adaptive Probabilities

    Get PDF
    This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing non-adaptive methods

    Faster Coordinate Descent via Adaptive Importance Sampling

    Get PDF
    Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory.Comment: appearing at AISTATS 201

    Coordinate Descent with Bandit Sampling

    Full text link
    Coordinate descent methods usually minimize a cost function by updating a random decision variable (corresponding to one coordinate) at a time. Ideally, we would update the decision variable that yields the largest decrease in the cost function. However, finding this coordinate would require checking all of them, which would effectively negate the improvement in computational tractability that coordinate descent is intended to afford. To address this, we propose a new adaptive method for selecting a coordinate. First, we find a lower bound on the amount the cost function decreases when a coordinate is updated. We then use a multi-armed bandit algorithm to learn which coordinates result in the largest lower bound by interleaving this learning with conventional coordinate descent updates except that the coordinate is selected proportionately to the expected decrease. We show that our approach improves the convergence of coordinate descent methods both theoretically and experimentally.Comment: appearing at NeurIPS 201

    Block-proximal methods with spatially adapted acceleration

    Full text link
    We study and develop (stochastic) primal--dual block-coordinate descent methods for convex problems based on the method due to Chambolle and Pock. Our methods have known convergence rates for the iterates and the ergodic gap: O(1/N2)O(1/N^2) if each block is strongly convex, O(1/N)O(1/N) if no convexity is present, and more generally a mixed rate O(1/N2)+O(1/N)O(1/N^2)+O(1/N) for strongly convex blocks, if only some blocks are strongly convex. Additional novelties of our methods include blockwise-adapted step lengths and acceleration, as well as the ability to update both the primal and dual variables randomly in blocks under a very light compatibility condition. In other words, these variants of our methods are doubly-stochastic. We test the proposed methods on various image processing problems, where we employ pixelwise-adapted acceleration
    • …
    corecore