1,469 research outputs found
Stochastic Dual Coordinate Ascent with Adaptive Probabilities
This paper introduces AdaSDCA: an adaptive variant of stochastic dual
coordinate ascent (SDCA) for solving the regularized empirical risk
minimization problems. Our modification consists in allowing the method
adaptively change the probability distribution over the dual variables
throughout the iterative process. AdaSDCA achieves provably better complexity
bound than SDCA with the best fixed probability distribution, known as
importance sampling. However, it is of a theoretical character as it is
expensive to implement. We also propose AdaSDCA+: a practical variant which in
our experiments outperforms existing non-adaptive methods
Faster Coordinate Descent via Adaptive Importance Sampling
Coordinate descent methods employ random partial updates of decision
variables in order to solve huge-scale convex optimization problems. In this
work, we introduce new adaptive rules for the random selection of their
updates. By adaptive, we mean that our selection rules are based on the dual
residual or the primal-dual gap estimates and can change at each iteration. We
theoretically characterize the performance of our selection rules and
demonstrate improvements over the state-of-the-art, and extend our theory and
algorithms to general convex objectives. Numerical evidence with hinge-loss
support vector machines and Lasso confirm that the practice follows the theory.Comment: appearing at AISTATS 201
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
Block-proximal methods with spatially adapted acceleration
We study and develop (stochastic) primal--dual block-coordinate descent
methods for convex problems based on the method due to Chambolle and Pock. Our
methods have known convergence rates for the iterates and the ergodic gap:
if each block is strongly convex, if no convexity is
present, and more generally a mixed rate for strongly convex
blocks, if only some blocks are strongly convex. Additional novelties of our
methods include blockwise-adapted step lengths and acceleration, as well as the
ability to update both the primal and dual variables randomly in blocks under a
very light compatibility condition. In other words, these variants of our
methods are doubly-stochastic. We test the proposed methods on various image
processing problems, where we employ pixelwise-adapted acceleration
- …