Search CORE

16 research outputs found

Backpropagating through Structured Argmax using a SPIGOT

Author: Peng Hao
Smith Noah A.
Thomson Sam
Publication venue
Publication date: 01/01/2018
Field of study

We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e.g., parsing) in intermediate layers. SPIGOT requires no marginal inference, unlike structured attention networks (Kim et al., 2017) and some reinforcement learning-inspired solutions (Yogatama et al., 2017). Like so-called straight-through estimators (Hinton, 2012), SPIGOT defines gradient-like quantities associated with intermediate nondifferentiable operations, allowing backpropagation before and after them; SPIGOT's proxy aims to ensure that, after a parameter update, the intermediate structure will remain well-formed. We experiment on two structured NLP pipelines: syntactic-then-semantic dependency parsing, and semantic parsing followed by sentiment classification. We show that training with SPIGOT leads to a larger improvement on the downstream task than a modularly-trained pipeline, the straight-through estimator, and structured attention, reaching a new state of the art on semantic dependency parsing.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Scipedia

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Author: Martins André F. T.
Mihaylova Tsvetomila
Niculae Vlad
Publication venue
Publication date: 01/01/2020
Field of study

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT - a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.Comment: EMNLP 202

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Author: Martins A.F.T.
Mihaylova T.
Niculae V.
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach

Author: Abbas Ahmed
Swoboda Paul
Publication venue
Publication date: 01/01/2021
Field of study

We propose a fully differentiable architecture for simultaneous semantic and instance segmentation (a.k.a. panoptic segmentation) consisting of a convolutional neural network and an asymmetric multiway cut problem solver. The latter solves a combinatorial optimization problem that elegantly incorporates semantic and boundary predictions to produce a panoptic labeling. Our formulation allows to directly maximize a smooth surrogate of the panoptic quality metric by backpropagating the gradient through the optimization problem. Experimental evaluation shows improvement by backpropagating through the optimization problem w.r.t. comparable approaches on Cityscapes and COCO datasets. Overall, our approach shows the utility of using combinatorial optimization in tandem with deep learning in a challenging large scale real-world problem and showcases benefits and insights into training such an architecture.Comment: To be presented at NeurIPS 202

arXiv.org e-Print Archive

MPG.PuRe

Efficient Beam Tree Recursion

Author: Caragea Cornelia
Chowdhury Jishnu Ray
Publication venue
Publication date: 20/07/2023
Field of study

Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by

10

16

times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form

f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}

into a sequence contextualizer of the form

f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}

. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models

arXiv.org e-Print Archive

Supervised Neural Clustering via Latent Structured Output Learning: Application to Question Intents

Author: Alessandro Moschitti
Iryna Haponchyk
Publication venue
Publication date: 01/01/2021
Field of study

Previous pre-neural work on structured prediction has produced very effective supervised clustering algorithms using linear classifiers, e.g., structured SVM or perceptron. However, these cannot exploit the representation learning ability of neural networks, which would make supervised clustering even more powerful, i.e., general clustering patterns can be learned automatically. In this paper, we design neural networks based on latent structured prediction loss and Transformer models to approach supervised clustering. We tested our methods on the task of automatically recreating categories of intents from publicly available question intent corpora. The results show that our approach delivers 95.65% of F1, outperforming the state of the art by 17.24%

Open Access Repository