1,242 research outputs found
Optimizing Transformer for Low-Resource Neural Machine Translation
Language pairs with limited amounts of parallel data, also known as
low-resource languages, remain a challenge for neural machine translation.
While the Transformer model has achieved significant improvements for many
language pairs and has become the de facto mainstream architecture, its
capability under low-resource conditions has not been fully investigated yet.
Our experiments on different subsets of the IWSLT14 training data show that the
effectiveness of Transformer under low-resource conditions is highly dependent
on the hyper-parameter settings. Our experiments show that using an optimized
Transformer for low-resource conditions improves the translation quality up to
7.3 BLEU points compared to using the Transformer default settings.Comment: To be published in COLING 202
Recommended from our members
A Composability-Based Transformer Pruning Framework
This thesis addresses the crucial issue of deploying large Transformer models on resource-constrained edge devices. Given the slow training speeds, the current Transformer pruning and fine-tuning process becomes tedious and time-consuming for multiple pruning configurations. To remedy this, the research proposes a novel composability-based Transformer pruning framework, aiming to significantly reduce the time required for fine-tuning pruned models across various configurations, while maintaining model performance. Unlike traditional approaches, this study explores the composability between Transformer pruning configurations, unveiling opportunities for computational reuse. It leverages this composability within a newly proposed framework, employing techniques similar to knowledge distillation and automating the process of pruning and fine-tuning. The framework demonstrated its ability to speed up the response time needed to fine-tune a model based on the given pruning configuration, making it a practical tool for real-world deployment on edge devices. The outcome of this research is a novel method that opens up a fresh perspective on Transformer model compression, offering a reference for future studies on pruning and fine-tuning of Transformer networks
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
We present Perceiver-VL, a vision-and-language framework that efficiently
handles high-dimensional multimodal inputs such as long videos and text.
Powered by the iterative latent cross-attention of Perceiver, our framework
scales with linear complexity, in contrast to the quadratic complexity of
self-attention used in many state-of-the-art transformer-based models. To
further improve the efficiency of our framework, we also study applying
LayerDrop on cross-attention layers and introduce a mixed-stream architecture
for cross-modal retrieval. We evaluate Perceiver-VL on diverse video-text and
image-text benchmarks, where Perceiver-VL achieves the lowest GFLOPs and
latency while maintaining competitive performance. In addition, we also provide
comprehensive analyses of various aspects of our framework, including
pretraining data, scalability of latent size and input size, dropping
cross-attention layers at inference to reduce latency, modality aggregation
strategy, positional encoding, and weight initialization strategy. Our code and
checkpoints are available at: https://github.com/zinengtang/Perceiver_VLComment: WACV 2023 (first two authors contributed equally
- …