Search CORE

444 research outputs found

Sparsifying Transformer Models with Trainable Representation Pooling

Author: Borchmann Łukasz
Garncarek Łukasz
Pietruszka Michał
Publication venue
Publication date: 04/02/2021
Field of study

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on task-specific parts of the input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-k operator. For example, our experiments on a challenging summarization task of long documents show that our method is over 3 times faster and up to 16 times more memory efficient while significantly outperforming both dense and state-of-the-art sparse transformer models. The method can be effortlessly applied to many models used in NLP and CV, simultaneously with other improvements.Comment: Provided formal overview. Reevaluated with Google Research scrip

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Blockwise Parallel Transformer for Long Context Large Models

Author: Abbeel Pieter
Liu Hao
Publication venue
Publication date: 21/08/2023
Field of study

Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large feedforward network in Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving multiple long sequences or long-term dependencies. We present a distinct approach, Blockwise Parallel Transformer (BPT), that leverages blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences up to 32 times longer than vanilla Transformers and 2 to 4 times longer than previous memory-efficient methods. Extensive experiments on language modeling and reinforcement learning tasks demonstrate the effectiveness of BPT in reducing memory requirements and improving performance

arXiv.org e-Print Archive

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

Author: Bojar Ondřej
Polák Peter
Publication venue
Publication date: 20/09/2023
Field of study

Current simultaneous speech translation models can process audio only up to a few seconds long. Contemporary datasets provide an oracle segmentation into sentences based on human-annotated transcripts and translations. However, the segmentation into sentences is not available in the real world. Current speech segmentation approaches either offer poor segmentation quality or have to trade latency for quality. In this paper, we propose a novel segmentation approach for a low-latency end-to-end speech translation. We leverage the existing speech translation encoder-decoder architecture with ST CTC and show that it can perform the segmentation task without supervision or additional parameters. To the best of our knowledge, our method is the first that allows an actual end-to-end simultaneous speech translation, as the same model is used for translation and segmentation at the same time. On a diverse set of language pairs and in- and out-of-domain data, we show that the proposed approach achieves state-of-the-art quality at no additional computational cost.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive