16 research outputs found
Backpropagating through Structured Argmax using a SPIGOT
We introduce the structured projection of intermediate gradients optimization
technique (SPIGOT), a new method for backpropagating through neural networks
that include hard-decision structured predictions (e.g., parsing) in
intermediate layers. SPIGOT requires no marginal inference, unlike structured
attention networks (Kim et al., 2017) and some reinforcement learning-inspired
solutions (Yogatama et al., 2017). Like so-called straight-through estimators
(Hinton, 2012), SPIGOT defines gradient-like quantities associated with
intermediate nondifferentiable operations, allowing backpropagation before and
after them; SPIGOT's proxy aims to ensure that, after a parameter update, the
intermediate structure will remain well-formed.
We experiment on two structured NLP pipelines: syntactic-then-semantic
dependency parsing, and semantic parsing followed by sentiment classification.
We show that training with SPIGOT leads to a larger improvement on the
downstream task than a modularly-trained pipeline, the straight-through
estimator, and structured attention, reaching a new state of the art on
semantic dependency parsing.Comment: ACL 201
Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning
Latent structure models are a powerful tool for modeling language data: they
can mitigate the error propagation and annotation bottleneck in pipeline
systems, while simultaneously uncovering linguistic insights about the data.
One challenge with end-to-end training of these models is the argmax operation,
which has null gradient. In this paper, we focus on surrogate gradients, a
popular strategy to deal with this problem. We explore latent structure
learning through the angle of pulling back the downstream learning objective.
In this paradigm, we discover a principled motivation for both the
straight-through estimator (STE) as well as the recently-proposed SPIGOT - a
variant of STE for structured models. Our perspective leads to new algorithms
in the same family. We empirically compare the known and the novel pulled-back
estimators against the popular alternatives, yielding new insight for
practitioners and revealing intriguing failure cases.Comment: EMNLP 202
Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach
We propose a fully differentiable architecture for simultaneous semantic and
instance segmentation (a.k.a. panoptic segmentation) consisting of a
convolutional neural network and an asymmetric multiway cut problem solver. The
latter solves a combinatorial optimization problem that elegantly incorporates
semantic and boundary predictions to produce a panoptic labeling. Our
formulation allows to directly maximize a smooth surrogate of the panoptic
quality metric by backpropagating the gradient through the optimization
problem. Experimental evaluation shows improvement by backpropagating through
the optimization problem w.r.t. comparable approaches on Cityscapes and COCO
datasets. Overall, our approach shows the utility of using combinatorial
optimization in tandem with deep learning in a challenging large scale
real-world problem and showcases benefits and insights into training such an
architecture.Comment: To be presented at NeurIPS 202
Efficient Beam Tree Recursion
Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a
simple extension of Gumbel Tree RvNN and it was shown to achieve
state-of-the-art length generalization performance in ListOps while maintaining
comparable performance on other tasks. However, although not the worst in its
kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this
paper, we identify the main bottleneck in BT-RvNN's memory usage to be the
entanglement of the scorer function and the recursive cell function. We propose
strategies to remove this bottleneck and further simplify its memory usage.
Overall, our strategies not only reduce the memory usage of BT-RvNN by
- times but also create a new state-of-the-art in ListOps while
maintaining similar performance in other tasks. In addition, we also propose a
strategy to utilize the induced latent-tree node representations produced by
BT-RvNN to turn BT-RvNN from a sentence encoder of the form into a sequence contextualizer of the
form . Thus, our
proposals not only open up a path for further scalability of RvNNs but also
standardize a way to use BT-RvNNs as another building block in the deep
learning toolkit that can be easily stacked or interfaced with other popular
models such as Transformers and Structured State Space models
Supervised Neural Clustering via Latent Structured Output Learning: Application to Question Intents
Previous pre-neural work on structured prediction has produced very effective supervised clustering algorithms using linear classifiers, e.g., structured SVM or perceptron. However, these cannot exploit the representation learning ability of neural networks, which would make supervised clustering even more powerful, i.e., general clustering patterns can be learned automatically. In this paper, we design neural networks based on latent structured prediction loss and Transformer models to approach supervised clustering. We tested our methods on the task of automatically recreating categories of intents from publicly available question intent corpora. The results show that our approach delivers 95.65% of F1, outperforming the state of the art by 17.24%