9,958 research outputs found
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
Global and Quadratic Convergence of Newton Hard-Thresholding Pursuit
Algorithms based on the hard thresholding principle have been well studied
with sounding theoretical guarantees in the compressed sensing and more general
sparsity-constrained optimization. It is widely observed in existing empirical
studies that when a restricted Newton step was used (as the debiasing step),
the hard-thresholding algorithms tend to meet halting conditions in a
significantly low number of iterations and are very efficient. Hence, the thus
obtained Newton hard-thresholding algorithms call for stronger theoretical
guarantees than for their simple hard-thresholding counterparts. This paper
provides a theoretical justification for the use of the restricted Newton step.
We build our theory and algorithm, Newton Hard-Thresholding Pursuit (NHTP), for
the sparsity-constrained optimization. Our main result shows that NHTP is
quadratically convergent under the standard assumption of restricted strong
convexity and smoothness. We also establish its global convergence to a
stationary point under a weaker assumption. In the special case of the
compressive sensing, NHTP effectively reduces to some of the existing
hard-thresholding algorithms with a Newton step. Consequently, our fast
convergence result justifies why those algorithms perform better than without
the Newton step. The efficiency of NHTP was demonstrated on both synthetic and
real data in compressed sensing and sparse logistic regression
Expectation-maximization for logistic regression
We present a family of expectation-maximization (EM) algorithms for binary
and negative-binomial logistic regression, drawing a sharp connection with the
variational-Bayes algorithm of Jaakkola and Jordan (2000). Indeed, our results
allow a version of this variational-Bayes approach to be re-interpreted as a
true EM algorithm. We study several interesting features of the algorithm, and
of this previously unrecognized connection with variational Bayes. We also
generalize the approach to sparsity-promoting priors, and to an online method
whose convergence properties are easily established. This latter method
compares favorably with stochastic-gradient descent in situations with marked
collinearity
- …