Search CORE

13 research outputs found

A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models

Author: Berg-Kirkpatrick Taylor
Dyer Chris
Goyal Kartik
Neubig Graham
Publication venue
Publication date: 06/10/2017
Field of study

Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do not directly consider the behaviour of the final decoding method. As a result, for cross-entropy trained models, beam decoding can sometimes yield reduced test performance when compared with greedy decoding. In order to train models that can more effectively make use of beam search, we propose a new training procedure that focuses on the final loss metric (e.g. Hamming loss) evaluated on the output of beam search. While well-defined, this "direct loss" objective is itself discontinuous and thus difficult to optimize. Hence, in our approach, we form a sub-differentiable surrogate objective by introducing a novel continuous approximation of the beam search decoding procedure. In experiments, we show that optimizing this new training objective yields substantially better results on two sequence tasks (Named Entity Recognition and CCG Supertagging) when compared with both cross entropy trained greedy decoding and cross entropy trained beam decoding baselines.Comment: Updated for clarity and notational consistenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Successive Halving Top-k Operator

Author: Borchmann Łukasz
Graliński Filip
Pietruszka Michał
Publication venue
Publication date: 08/10/2020
Field of study

We propose a differentiable successive halving method of relaxing the top-k operator, rendering gradient-based optimization possible. The need to perform softmax iteratively on the entire vector of scores is avoided by using a tournament-style selection. As a result, a much better approximation of top-k with lower computational cost is achieved compared to the previous approach.Comment: Work in progres

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sparsifying Transformer Models with Trainable Representation Pooling

Author: Borchmann Łukasz
Garncarek Łukasz
Pietruszka Michał
Publication venue
Publication date: 04/02/2021
Field of study

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on task-specific parts of the input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-k operator. For example, our experiments on a challenging summarization task of long documents show that our method is over 3 times faster and up to 16 times more memory efficient while significantly outperforming both dense and state-of-the-art sparse transformer models. The method can be effortlessly applied to many models used in NLP and CV, simultaneously with other improvements.Comment: Provided formal overview. Reevaluated with Google Research scrip

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Learning to Summarize Videos by Contrasting Clips

Author: Kaandorp Cees
Moskalev Artem
Smeulders Arnold
Sosnovik Ivan
Publication venue
Publication date: 19/04/2023
Field of study

Video summarization aims at choosing parts of a video that narrate a story as close as possible to the original one. Most of the existing video summarization approaches focus on hand-crafted labels. As the number of videos grows exponentially, there emerges an increasing need for methods that can learn meaningful summarizations without labeled annotations. In this paper, we aim to maximally exploit unsupervised video summarization while concentrating the supervision to a few, personalized labels as an add-on. To do so, we formulate the key requirements for the informative video summarization. Then, we propose contrastive learning as the answer to both questions. To further boost Contrastive video Summarization (CSUM), we propose to contrast top-k features instead of a mean video feature as employed by the existing method, which we implement with a differentiable top-k feature selector. Our experiments on several benchmarks demonstrate, that our approach allows for meaningful and diverse summaries when no labeled data is provided

arXiv.org e-Print Archive

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

Author: Kool Wouter
van Hoof Herke
Welling Max
Publication venue
Publication date: 01/01/2019
Field of study

The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample

k

elements without replacement. We show how to implicitly apply this 'Gumbel-Top-

k

' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in

k

and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.Comment: ICML 2019 ; 13 pages, 4 figure

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE