3,574 research outputs found
Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Block coordinate descent (BCD) methods are widely-used for large-scale
numerical optimization because of their cheap iteration costs, low memory
requirements, amenability to parallelization, and ability to exploit problem
structure. Three main algorithmic choices influence the performance of BCD
methods: the block partitioning strategy, the block selection rule, and the
block update rule. In this paper we explore all three of these building blocks
and propose variations for each that can lead to significantly faster BCD
methods. We (i) propose new greedy block-selection strategies that guarantee
more progress per iteration than the Gauss-Southwell rule; (ii) explore
practical issues like how to implement the new rules when using "variable"
blocks; (iii) explore the use of message-passing to compute matrix or Newton
updates efficiently on huge blocks for problems with a sparse dependency
between variables; and (iv) consider optimal active manifold identification,
which leads to bounds on the "active set complexity" of BCD methods and leads
to superlinear convergence for certain problems with sparse solutions (and in
some cases finite termination at an optimal solution). We support all of our
findings with numerical results for the classic machine learning problems of
least squares, logistic regression, multi-class logistic regression, label
propagation, and L1-regularization
MAP inference via Block-Coordinate Frank-Wolfe Algorithm
We present a new proximal bundle method for Maximum-A-Posteriori (MAP)
inference in structured energy minimization problems. The method optimizes a
Lagrangean relaxation of the original energy minimization problem using a multi
plane block-coordinate Frank-Wolfe method that takes advantage of the specific
structure of the Lagrangean decomposition. We show empirically that our method
outperforms state-of-the-art Lagrangean decomposition based algorithms on some
challenging Markov Random Field, multi-label discrete tomography and graph
matching problems
Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs
This paper introduces a novel algorithm for transductive inference in
higher-order MRFs, where the unary energies are parameterized by a variable
classifier. The considered task is posed as a joint optimization problem in the
continuous classifier parameters and the discrete label variables. In contrast
to prior approaches such as convex relaxations, we propose an advantageous
decoupling of the objective function into discrete and continuous subproblems
and a novel, efficient optimization method related to ADMM. This approach
preserves integrality of the discrete label variables and guarantees global
convergence to a critical point. We demonstrate the advantages of our approach
in several experiments including video object segmentation on the DAVIS data
set and interactive image segmentation
Message-Passing Algorithms for Quadratic Minimization
Gaussian belief propagation (GaBP) is an iterative algorithm for computing
the mean of a multivariate Gaussian distribution, or equivalently, the minimum
of a multivariate positive definite quadratic function. Sufficient conditions,
such as walk-summability, that guarantee the convergence and correctness of
GaBP are known, but GaBP may fail to converge to the correct solution given an
arbitrary positive definite quadratic function. As was observed in previous
work, the GaBP algorithm fails to converge if the computation trees produced by
the algorithm are not positive definite. In this work, we will show that the
failure modes of the GaBP algorithm can be understood via graph covers, and we
prove that a parameterized generalization of the min-sum algorithm can be used
to ensure that the computation trees remain positive definite whenever the
input matrix is positive definite. We demonstrate that the resulting algorithm
is closely related to other iterative schemes for quadratic minimization such
as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically,
that there always exists a choice of parameters such that the above
generalization of the GaBP algorithm converges
- …