40,092 research outputs found
Large Margin Boltzmann Machines and Large Margin Sigmoid Belief Networks
Current statistical models for structured prediction make simplifying
assumptions about the underlying output graph structure, such as assuming a
low-order Markov chain, because exact inference becomes intractable as the
tree-width of the underlying graph increases. Approximate inference algorithms,
on the other hand, force one to trade off representational power with
computational efficiency. In this paper, we propose two new types of
probabilistic graphical models, large margin Boltzmann machines (LMBMs) and
large margin sigmoid belief networks (LMSBNs), for structured prediction.
LMSBNs in particular allow a very fast inference algorithm for arbitrary graph
structures that runs in polynomial time with a high probability. This
probability is data-distribution dependent and is maximized in learning. The
new approach overcomes the representation-efficiency trade-off in previous
models and allows fast structured prediction with complicated graph structures.
We present results from applying a fully connected model to multi-label scene
classification and demonstrate that the proposed approach can yield significant
performance gains over current state-of-the-art methods
Improving Joint Training of Inference Networks and Structured Prediction Energy Networks
Deep energy-based models are powerful, but pose challenges for learning and
inference (Belanger and McCallum, 2016). Tu and Gimpel (2018) developed an
efficient framework for energy-based models by training "inference networks" to
approximate structured inference instead of using gradient descent. However,
their alternating optimization approach suffers from instabilities during
training, requiring additional loss terms and careful hyperparameter tuning. In
this paper, we contribute several strategies to stabilize and improve this
joint training of energy functions and inference networks for structured
prediction. We design a compound objective to jointly train both cost-augmented
and test-time inference networks along with the energy function. We propose
joint parameterizations for the inference networks that encourage them to
capture complementary functionality during learning. We empirically validate
our strategies on two sequence labeling tasks, showing easier paths to strong
performance than prior work, as well as further improvements with global energy
terms
Structured Prediction Energy Networks
We introduce structured prediction energy networks (SPENs), a flexible
framework for structured prediction. A deep architecture is used to define an
energy function of candidate labels, and then predictions are produced by using
back-propagation to iteratively optimize the energy with respect to the labels.
This deep architecture captures dependencies between labels that would lead to
intractable graphical models, and performs structure learning by automatically
learning discriminative features of the structured output. One natural
application of our technique is multi-label classification, which traditionally
has required strict prior assumptions about the interactions between labels to
ensure tractable learning and prediction. We are able to apply SPENs to
multi-label problems with substantially larger label sets than previous
applications of structured prediction, while modeling high-order interactions
using minimal structural assumptions. Overall, deep learning provides
remarkable tools for learning features of the inputs to a prediction problem,
and this work extends these techniques to learning features of structured
outputs. Our experiments provide impressive performance on a variety of
benchmark multi-label classification tasks, demonstrate that our technique can
be used to provide interpretable structure learning, and illuminate fundamental
trade-offs between feed-forward and iterative structured prediction.Comment: ICML 201
Deep Structured Prediction with Nonlinear Output Transformations
Deep structured models are widely used for tasks like semantic segmentation,
where explicit correlations between variables provide important prior
information which generally helps to reduce the data needs of deep nets.
However, current deep structured models are restricted by oftentimes very local
neighborhood structure, which cannot be increased for computational complexity
reasons, and by the fact that the output configuration, or a representation
thereof, cannot be transformed further. Very recent approaches which address
those issues include graphical model inference inside deep nets so as to permit
subsequent non-linear output space transformations. However, optimization of
those formulations is challenging and not well understood. Here, we develop a
novel model which generalizes existing approaches, such as structured
prediction energy networks, and discuss a formulation which maintains
applicability of existing inference techniques.Comment: Appearing in NIPS 201
Benchmarking Approximate Inference Methods for Neural Structured Prediction
Exact structured inference with neural network scoring functions is
computationally challenging but several methods have been proposed for
approximating inference. One approach is to perform gradient descent with
respect to the output structure directly (Belanger and McCallum, 2016). Another
approach, proposed recently, is to train a neural network (an "inference
network") to perform inference (Tu and Gimpel, 2018). In this paper, we compare
these two families of inference methods on three sequence labeling datasets. We
choose sequence labeling because it permits us to use exact inference as a
benchmark in terms of speed, accuracy, and search error. Across datasets, we
demonstrate that inference networks achieve a better speed/accuracy/search
error trade-off than gradient descent, while also being faster than exact
inference at similar accuracy levels. We find further benefit by combining
inference networks and gradient descent, using the former to provide a warm
start for the latter.Comment: NAACL2019 camera-ready versio
Local Perturb-and-MAP for Structured Prediction
Conditional random fields (CRFs) provide a powerful tool for structured
prediction, but cast significant challenges in both the learning and inference
steps. Approximation techniques are widely used in both steps, which should be
considered jointly to guarantee good performance (a.k.a. "inferning").
Perturb-and-MAP models provide a promising alternative to CRFs, but require
global combinatorial optimization and hence they are usable only on specific
models. In this work, we present a new Local Perturb-and-MAP (locPMAP)
framework that replaces the global optimization with a local optimization by
exploiting our observed connection between locPMAP and the pseudolikelihood of
the original CRF model. We test our approach on three different vision tasks
and show that our method achieves consistently improved performance over other
approximate inference techniques optimized to a pseudolikelihood objective.
Additionally, we demonstrate that we can integrate our method in the fully
convolutional network framework to increase our model's complexity. Finally,
our observed connection between locPMAP and the pseudolikelihood leads to a
novel perspective for understanding and using pseudolikelihood
Learning Discriminators as Energy Networks in Adversarial Learning
We propose a novel framework for structured prediction via adversarial
learning. Existing adversarial learning methods involve two separate networks,
i.e., the structured prediction models and the discriminative models, in the
training. The information captured by discriminative models complements that in
the structured prediction models, but few existing researches have studied on
utilizing such information to improve structured prediction models at the
inference stage. In this work, we propose to refine the predictions of
structured prediction models by effectively integrating discriminative models
into the prediction. Discriminative models are treated as energy-based models.
Similar to the adversarial learning, discriminative models are trained to
estimate scores which measure the quality of predicted outputs, while
structured prediction models are trained to predict contrastive outputs with
maximal energy scores. In this way, the gradient vanishing problem is
ameliorated, and thus we are able to perform inference by following the ascent
gradient directions of discriminative models to refine structured prediction
models. The proposed method is able to handle a range of tasks, e.g.,
multi-label classification and image segmentation. Empirical results on these
two tasks validate the effectiveness of our learning method
Adaptive Path-Integral Autoencoder: Representation Learning and Planning for Dynamical Systems
We present a representation learning algorithm that learns a low-dimensional
latent dynamical system from high-dimensional \textit{sequential} raw data,
e.g., video. The framework builds upon recent advances in amortized inference
methods that use both an inference network and a refinement procedure to output
samples from a variational distribution given an observation sequence, and
takes advantage of the duality between control and inference to approximately
solve the intractable inference problem using the path integral control
approach. The learned dynamical model can be used to predict and plan the
future states; we also present the efficient planning method that exploits the
learned low-dimensional latent dynamics. Numerical experiments show that the
proposed path-integral control based variational inference method leads to
tighter lower bounds in statistical model learning of sequential data. The
supplementary video: https://youtu.be/xCp35crUoLQComment: Neural Information Processing Systems (NeurIPS) 201
End-to-end learning potentials for structured attribute prediction
We present a structured inference approach in deep neural networks for
multiple attribute prediction. In attribute prediction, a common approach is to
learn independent classifiers on top of a good feature representation. However,
such classifiers assume conditional independence on features and do not
explicitly consider the dependency between attributes in the inference process.
We propose to formulate attribute prediction in terms of marginal inference in
the conditional random field. We model potential functions by deep neural
networks and apply the sum-product algorithm to solve for the approximate
marginal distribution in feed-forward networks. Our message passing layer
implements sparse pairwise potentials by a softplus-linear function that is
equivalent to a higher-order classifier, and learns all the model parameters by
end-to-end back propagation. The experimental results using SUN attributes and
CelebA datasets suggest that the structured inference improves the attribute
prediction performance, and possibly uncovers the hidden relationship between
attributes
Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic Maps
We introduce Graph-Structured Sum-Product Networks (GraphSPNs), a
probabilistic approach to structured prediction for problems where dependencies
between latent variables are expressed in terms of arbitrary, dynamic graphs.
While many approaches to structured prediction place strict constraints on the
interactions between inferred variables, many real-world problems can be only
characterized using complex graph structures of varying size, often
contaminated with noise when obtained from real data. Here, we focus on one
such problem in the domain of robotics. We demonstrate how GraphSPNs can be
used to bolster inference about semantic, conceptual place descriptions using
noisy topological relations discovered by a robot exploring large-scale office
spaces. Through experiments, we show that GraphSPNs consistently outperform the
traditional approach based on undirected graphical models, successfully
disambiguating information in global semantic maps built from uncertain, noisy
local evidence. We further exploit the probabilistic nature of the model to
infer marginal distributions over semantic descriptions of as yet unexplored
places and detect spatial environment configurations that are novel and
incongruent with the known evidence.Comment: 9 pages, 8 figures. AAAI Conference on Artificial Intelligence (AAAI
2018
- …