9,903 research outputs found
Learning efficiently with approximate inference via dual losses
Many structured prediction tasks involve
complex models where inference is computationally intractable, but where it can be well
approximated using a linear programming
relaxation. Previous approaches for learning for structured prediction (e.g., cutting-
plane, subgradient methods, perceptron) repeatedly make predictions for some of the
data points. These approaches are computationally demanding because each prediction
involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea
is to instead solve the dual of the structured
prediction loss. We formulate the learning
task as a convex minimization over both the
weights and the dual variables corresponding
to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with state-of-the-art methods such as
stochastic subgradient and cutting-plane
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
Block Belief Propagation for Parameter Learning in Markov Random Fields
Traditional learning methods for training Markov random fields require doing
inference over all variables to compute the likelihood gradient. The iteration
complexity for those methods therefore scales with the size of the graphical
models. In this paper, we propose \emph{block belief propagation learning}
(BBPL), which uses block-coordinate updates of approximate marginals to compute
approximate gradients, removing the need to compute inference on the entire
graphical model. Thus, the iteration complexity of BBPL does not scale with the
size of the graphs. We prove that the method converges to the same solution as
that obtained by using full inference per iteration, despite these
approximations, and we empirically demonstrate its scalability improvements
over standard training methods.Comment: Accepted to AAAI 201
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
Learning Deep Structured Models
Many problems in real-world applications involve predicting several random
variables which are statistically related. Markov random fields (MRFs) are a
great mathematical tool to encode such relationships. The goal of this paper is
to combine MRFs with deep learning algorithms to estimate complex
representations while taking into account the dependencies between the output
random variables. Towards this goal, we propose a training algorithm that is
able to learn structured models jointly with deep features that form the MRF
potentials. Our approach is efficient as it blends learning and inference and
makes use of GPU acceleration. We demonstrate the effectiveness of our
algorithm in the tasks of predicting words from noisy images, as well as
multi-class classification of Flickr photographs. We show that joint learning
of the deep features and the MRF parameters results in significant performance
gains.Comment: 11 pages including referenc
- …