16 research outputs found

    Backpropagating through Structured Argmax using a SPIGOT

    Full text link
    We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e.g., parsing) in intermediate layers. SPIGOT requires no marginal inference, unlike structured attention networks (Kim et al., 2017) and some reinforcement learning-inspired solutions (Yogatama et al., 2017). Like so-called straight-through estimators (Hinton, 2012), SPIGOT defines gradient-like quantities associated with intermediate nondifferentiable operations, allowing backpropagation before and after them; SPIGOT's proxy aims to ensure that, after a parameter update, the intermediate structure will remain well-formed. We experiment on two structured NLP pipelines: syntactic-then-semantic dependency parsing, and semantic parsing followed by sentiment classification. We show that training with SPIGOT leads to a larger improvement on the downstream task than a modularly-trained pipeline, the straight-through estimator, and structured attention, reaching a new state of the art on semantic dependency parsing.Comment: ACL 201

    Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

    Get PDF
    Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.Comment: EMNLP 202

    Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach

    Full text link
    We propose a fully differentiable architecture for simultaneous semantic and instance segmentation (a.k.a. panoptic segmentation) consisting of a convolutional neural network and an asymmetric multiway cut problem solver. The latter solves a combinatorial optimization problem that elegantly incorporates semantic and boundary predictions to produce a panoptic labeling. Our formulation allows to directly maximize a smooth surrogate of the panoptic quality metric by backpropagating the gradient through the optimization problem. Experimental evaluation shows improvement by backpropagating through the optimization problem w.r.t. comparable approaches on Cityscapes and COCO datasets. Overall, our approach shows the utility of using combinatorial optimization in tandem with deep learning in a challenging large scale real-world problem and showcases benefits and insights into training such an architecture.Comment: To be presented at NeurIPS 202

    Efficient Beam Tree Recursion

    Full text link
    Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by 1010-1616 times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form f:Rn×d→Rdf:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d} into a sequence contextualizer of the form f:Rn×d→Rn×df:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models

    Supervised Neural Clustering via Latent Structured Output Learning: Application to Question Intents

    Get PDF
    Previous pre-neural work on structured prediction has produced very effective supervised clustering algorithms using linear classifiers, e.g., structured SVM or perceptron. However, these cannot exploit the representation learning ability of neural networks, which would make supervised clustering even more powerful, i.e., general clustering patterns can be learned automatically. In this paper, we design neural networks based on latent structured prediction loss and Transformer models to approach supervised clustering. We tested our methods on the task of automatically recreating categories of intents from publicly available question intent corpora. The results show that our approach delivers 95.65% of F1, outperforming the state of the art by 17.24%
    corecore