1,839 research outputs found
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs
We propose a randomized block-coordinate variant of the classic Frank-Wolfe
algorithm for convex optimization with block-separable constraints. Despite its
lower iteration cost, we show that it achieves a similar convergence rate in
duality gap as the full Frank-Wolfe algorithm. We also show that, when applied
to the dual structural support vector machine (SVM) objective, this yields an
online algorithm that has the same low iteration complexity as primal
stochastic subgradient methods. However, unlike stochastic subgradient methods,
the block-coordinate Frank-Wolfe algorithm allows us to compute the optimal
step-size and yields a computable duality gap guarantee. Our experiments
indicate that this simple algorithm outperforms competing structural SVM
solvers.Comment: Appears in Proceedings of the 30th International Conference on
Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes
from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap
proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added
weighted averaging experiments + convergence results; 4) Clarified main text
and relationship with appendi
MAP inference via Block-Coordinate Frank-Wolfe Algorithm
We present a new proximal bundle method for Maximum-A-Posteriori (MAP)
inference in structured energy minimization problems. The method optimizes a
Lagrangean relaxation of the original energy minimization problem using a multi
plane block-coordinate Frank-Wolfe method that takes advantage of the specific
structure of the Lagrangean decomposition. We show empirically that our method
outperforms state-of-the-art Lagrangean decomposition based algorithms on some
challenging Markov Random Field, multi-label discrete tomography and graph
matching problems
A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs with a Costly max-Oracle
Structural support vector machines (SSVMs) are amongst the best performing
models for structured computer vision tasks, such as semantic image
segmentation or human pose estimation. Training SSVMs, however, is
computationally costly, because it requires repeated calls to a structured
prediction subroutine (called \emph{max-oracle}), which has to solve an
optimization problem itself, e.g. a graph cut.
In this work, we introduce a new algorithm for SSVM training that is more
efficient than earlier techniques when the max-oracle is computationally
expensive, as it is frequently the case in computer vision tasks. The main idea
is to (i) combine the recent stochastic Block-Coordinate Frank-Wolfe algorithm
with efficient hyperplane caching, and (ii) use an automatic selection rule for
deciding whether to call the exact max-oracle or to rely on an approximate one
based on the cached hyperplanes.
We show experimentally that this strategy leads to faster convergence to the
optimum with respect to the number of requires oracle calls, and that this
translates into faster convergence with respect to the total runtime when the
max-oracle is slow compared to the other steps of the algorithm.
A publicly available C++ implementation is provided at
http://pub.ist.ac.at/~vnk/papers/SVM.html
Frank-Wolfe Algorithms for Saddle Point Problems
We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained
smooth convex-concave saddle point (SP) problems. Remarkably, the method only
requires access to linear minimization oracles. Leveraging recent advances in
FW optimization, we provide the first proof of convergence of a FW-type saddle
point solver over polytopes, thereby partially answering a 30 year-old
conjecture. We also survey other convergence results and highlight gaps in the
theoretical underpinnings of FW-style algorithms. Motivating applications
without known efficient alternatives are explored through structured prediction
with combinatorial penalties as well as games over matching polytopes involving
an exponential number of constraints.Comment: Appears in: Proceedings of the 20th International Conference on
Artificial Intelligence and Statistics (AISTATS 2017). 39 page
Efficient Linear Programming for Dense CRFs
The fully connected conditional random field (CRF) with Gaussian pairwise
potentials has proven popular and effective for multi-class semantic
segmentation. While the energy of a dense CRF can be minimized accurately using
a linear programming (LP) relaxation, the state-of-the-art algorithm is too
slow to be useful in practice. To alleviate this deficiency, we introduce an
efficient LP minimization algorithm for dense CRFs. To this end, we develop a
proximal minimization framework, where the dual of each proximal problem is
optimized via block coordinate descent. We show that each block of variables
can be efficiently optimized. Specifically, for one block, the problem
decomposes into significantly smaller subproblems, each of which is defined
over a single pixel. For the other block, the problem is optimized via
conditional gradient descent. This has two advantages: 1) the conditional
gradient can be computed in a time linear in the number of pixels and labels;
and 2) the optimal step size can be computed analytically. Our experiments on
standard datasets provide compelling evidence that our approach outperforms all
existing baselines including the previous LP based approach for dense CRFs.Comment: 24 pages, 10 figures and 4 table
Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials
Dense conditional random fields (CRFs) have become a popular framework for
modelling several problems in computer vision such as stereo correspondence and
multi-class semantic segmentation. By modelling long-range interactions, dense
CRFs provide a labelling that captures finer detail than their sparse
counterparts. Currently, the state-of-the-art algorithm performs mean-field
inference using a filter-based method but fails to provide a strong theoretical
guarantee on the quality of the solution. A question naturally arises as to
whether it is possible to obtain a maximum a posteriori (MAP) estimate of a
dense CRF using a principled method. Within this paper, we show that this is
indeed possible. We will show that, by using a filter-based method, continuous
relaxations of the MAP problem can be optimised efficiently using
state-of-the-art algorithms. Specifically, we will solve a quadratic
programming (QP) relaxation using the Frank-Wolfe algorithm and a linear
programming (LP) relaxation by developing a proximal minimisation framework. By
exploiting labelling consistency in the higher-order potentials and utilising
the filter-based method, we are able to formulate the above algorithms such
that each iteration has a complexity linear in the number of classes and random
variables. The presented algorithms can be applied to any labelling problem
using a dense CRF with sparse higher-order potentials. In this paper, we use
semantic segmentation as an example application as it demonstrates the ability
of the algorithm to scale to dense CRFs with large dimensions. We perform
experiments on the Pascal dataset to indicate that the presented algorithms are
able to attain lower energies than the mean-field inference method
- …