5,699 research outputs found
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Consider the problem of minimizing the sum of a smooth (possibly non-convex)
and a convex (possibly nonsmooth) function involving a large number of
variables. A popular approach to solve this problem is the block coordinate
descent (BCD) method whereby at each iteration only one variable block is
updated while the remaining variables are held fixed. With the recent advances
in the developments of the multi-core parallel processing technology, it is
desirable to parallelize the BCD method by allowing multiple blocks to be
updated simultaneously at each iteration of the algorithm. In this work, we
propose an inexact parallel BCD approach where at each iteration, a subset of
the variables is updated in parallel by minimizing convex approximations of the
original objective function. We investigate the convergence of this parallel
BCD method for both randomized and cyclic variable selection rules. We analyze
the asymptotic and non-asymptotic convergence behavior of the algorithm for
both convex and non-convex objective functions. The numerical experiments
suggest that for a special case of Lasso minimization problem, the cyclic block
selection rule can outperform the randomized rule
- …