12,932 research outputs found
Faster Coordinate Descent via Adaptive Importance Sampling
Coordinate descent methods employ random partial updates of decision
variables in order to solve huge-scale convex optimization problems. In this
work, we introduce new adaptive rules for the random selection of their
updates. By adaptive, we mean that our selection rules are based on the dual
residual or the primal-dual gap estimates and can change at each iteration. We
theoretically characterize the performance of our selection rules and
demonstrate improvements over the state-of-the-art, and extend our theory and
algorithms to general convex objectives. Numerical evidence with hinge-loss
support vector machines and Lasso confirm that the practice follows the theory.Comment: appearing at AISTATS 201
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent
First-order methods play a central role in large-scale machine learning. Even
though many variations exist, each suited to a particular problem, almost all
such methods fundamentally rely on two types of algorithmic steps: gradient
descent, which yields primal progress, and mirror descent, which yields dual
progress.
We observe that the performances of gradient and mirror descent are
complementary, so that faster algorithms can be designed by LINEARLY COUPLING
the two. We show how to reconstruct Nesterov's accelerated gradient methods
using linear coupling, which gives a cleaner interpretation than Nesterov's
original proofs. We also discuss the power of linear coupling by extending it
to many other settings that Nesterov's methods cannot apply to.Comment: A new section added; polished writin
- …