73 research outputs found
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation. A total of 27 teams participated in at least one of the shared tasks. This paper details, for each shared task, the purpose of the task, the data that were released, the evaluation metrics that were applied, the submissions that were received and the results that were achieved
Recommended from our members
Phrase-level System Combination for Machine Translation Based on Target-to-Target Decoding
In this paper, we propose a novel lattice-based MT combination methodology that we call Target-to-Target Decoding (TTD). The combination process is carried out as a “translation” from backbone to the combination result. This perspective suggests the use of existing phrase-based MT techniques in the combination framework. We show how phrase extraction rules and confidence estimations inspired from machine translation improve results. We also propose system-specific LMs for estimating N-gram consensus. Our results show that our approach yields a strong improvement over the best single MT system and competes with other state-of-the-art combination systems
The IWSLT 2018 Evaluation Campaign
The InternationalWorkshop of Spoken Language Translation
(IWSLT) 2018 Evaluation Campaign featured two tasks: the
low-resourced machine translation task and the speech translation
task. In the first task, manual transcribed speech needs
to be translated from Basque to English. Since this translation
direction is a under-resourced language pair, participants
were encouraged to used additional parallel data from
related languages. In the second task, the participants need
to translate English audio into German text by building a full
speech-translation system. In the baseline condition, participants
were free to used any architecture, while they are restricted
to use a single model for the end-to-end task.
This year, eight research groups took part in the Basque
English translation task, and nine in the speech translation
tas
End-to-End Speech Translation of Arabic to English Broadcast News
Speech translation (ST) is the task of directly translating acoustic speech
signals in a source language into text in a foreign language. ST task has been
addressed, for a long time, using a pipeline approach with two modules : first
an Automatic Speech Recognition (ASR) in the source language followed by a
text-to-text Machine translation (MT). In the past few years, we have seen a
paradigm shift towards the end-to-end approaches using sequence-to-sequence
deep neural network models. This paper presents our efforts towards the
development of the first Broadcast News end-to-end Arabic to English speech
translation system. Starting from independent ASR and MT LDC releases, we were
able to identify about 92 hours of Arabic audio recordings for which the manual
transcription was also translated into English at the segment level. These data
was used to train and compare pipeline and end-to-end speech translation
systems under multiple scenarios including transfer learning and data
augmentation techniques.Comment: Arabic Natural Language Processing Workshop 202
Findings of the iWSLT 2023 evaluation campaign
This paper reports on the shared tasks organized by the 20th IWSLT Conference. The shared tasks address 9 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, multilingual, dialect and low-resource speech translation, and formality control. The shared tasks attracted a total of 38 submissions by 31 teams. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.peer-reviewe
Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC
Non-autoregressive approaches aim to improve the inference speed of
translation models, particularly those that generate output in a one-pass
forward manner. However, these approaches often suffer from a significant drop
in translation quality compared to autoregressive models. This paper introduces
a series of innovative techniques to enhance the translation quality of
Non-Autoregressive Translation (NAT) models while maintaining a substantial
acceleration in inference speed. We propose fine-tuning Pretrained Multilingual
Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Furthermore, we adopt the MASK insertion scheme for up-sampling instead of
token duplication, and we present an embedding distillation method to further
enhance performance. In our experiments, our model outperforms the baseline
autoregressive model (Transformer \textit{base}) on multiple datasets,
including WMT'14 DEEN, WMT'16 ROEN, and
IWSLT'14 DEEN. Notably, our model achieves better performance
than the baseline autoregressive model on the IWSLT'14 EnDe
and WMT'16 EnRo datasets, even without using distillation data
during training. It is worth highlighting that on the IWSLT'14
DEEN dataset, our model achieves an impressive BLEU score of
39.59, setting a new state-of-the-art performance. Additionally, our model
exhibits a remarkable speed improvement of 16.35 times compared to the
autoregressive model.Comment: 12 pages, 6 figure
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
End-to-end speech translation, a hot topic in recent years, aims to translate
a segment of audio into a specific language with an end-to-end model.
Conventional approaches employ multi-task learning and pre-training methods for
this task, but they suffer from the huge gap between pre-training and
fine-tuning. To address these issues, we propose a Tandem Connectionist
Encoding Network (TCEN) which bridges the gap by reusing all subnets in
fine-tuning, keeping the roles of subnets consistent, and pre-training the
attention module. Furthermore, we propose two simple but effective methods to
guarantee the speech encoder outputs and the MT encoder inputs are consistent
in terms of semantic representation and sequence length. Experimental results
show that our model outperforms baselines 2.2 BLEU on a large benchmark
dataset.Comment: AAAI202
- …