Search CORE

35 research outputs found

The ADAPT system description for the IWSLT 2018 Basque to English translation task

Author: Poncelas Alberto
Sarasola Kepa
Way Andy
Publication venue: IWSLT
Publication date: 30/10/2018
Field of study

In this paper we present the ADAPT system built for the Basque to English Low Resource MT Evaluation Campaign. Basque is a low-resourced, morphologically-rich language. This poses a challenge for Neural Machine Translation models which usually achieve better performance when trained with large sets of data. Accordingly, we used synthetic data to improve the translation quality produced by a model built using only authentic data. Our proposal uses back-translated data to: (a) create new sentences, so the system can be trained with more data; and (b) translate sentences that are close to the test set, so the model can be fine-tuned to the document to be translated

arXiv.org e-Print Archive

Irish Universities

DCU Online Research Access Service

The IWSLT 2018 Evaluation Campaign

Author: Cattoni Roldano
Cettolo Mauro
Federico Marcello
Niehues Jan
Stüker Sebastian
Turchi Marco
Publication venue
Publication date
Field of study

The InternationalWorkshop of Spoken Language Translation (IWSLT) 2018 Evaluation Campaign featured two tasks: the low-resourced machine translation task and the speech translation task. In the first task, manual transcribed speech needs to be translated from Basque to English. Since this translation direction is a under-resourced language pair, participants were encouraged to used additional parallel data from related languages. In the second task, the participants need to translate English audio into German text by building a full speech-translation system. In the baseline condition, participants were free to used any architecture, while they are restricted to use a single model for the end-to-end task. This year, eight research groups took part in the Basque English translation task, and nine in the speech translation tas

Archivio della ricerca - Fondazione Bruno Kessler

Transductive data-selection algorithms for fine-tuning neural machine translation

Author: Maillette de Buy Wenniger Gideon
Poncelas Alberto
Way Andy
Publication venue: ACL Anthology
Publication date: 20/08/2019
Field of study

Machine Translation models are trained to translate a variety of documents from one language into another. However, models specifically trained for a particular characteristics of the documents tend to perform better. Fine-tuning is a technique for adapting an NMT model to some domain. In this work, we want to use this technique to adapt the model to a given test set. In particular, we are using transductive data selection algorithms which take advantage the information of the test set to retrieve sentences from a larger parallel set

arXiv.org e-Print Archive

Irish Universities

DCU Online Research Access Service

Cascade or Direct Speech Translation? A Case Study

Author: Alvarez A.
Arzelus H.
Etchegoyhen T.
Fernandez E. B.
Gete H.
González-Docasal A.
Martín-Doñas J. M.
Torre I. G.
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

Speech translation has been traditionally tackled under a cascade approach, chaining speech recognition and machine translation components to translate from an audio source in a given language into text or speech in a target language. Leveraging on deep learning approaches to natural language processing, recent studies have explored the potential of direct end-to-end neural modelling to perform the speech translation task. Though several benefits may come from end-to-end modelling, such as a reduction in latency and error propagation, the comparative merits of each approach still deserve detailed evaluations and analyses. In this work, we compared state-of-the-art cascade and direct approaches on the under-resourced Basque–Spanish language pair, which features challenging phenomena such as marked differences in morphology and word order. This case study thus complements other studies in the field, which mostly revolve around the English language. We describe and analysed in detail the mintzai-ST corpus, prepared from the sessions of the Basque Parliament, and evaluated the strengths and limitations of cascade and direct speech translation models trained on this corpus, with variants exploiting additional data as well. Our results indicated that, despite significant progress with end-to-end models, which may outperform alternatives in some cases in terms of automated metrics, a cascade approach proved optimal overall in our experiments and manual evaluations. © 2022 by the authors. Licensee MDPI, Basel, Switzerland

Multidisciplinary Digital Publishing Institute

Repositorio Universidad de Zaragoza

Adaptation of machine translation models with back-translated data using transductive data selection methods

Author: Maillette de Buy Wenniger Gideon
Poncelas Alberto
Way Andy
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2019
Field of study

Data selection has proven its merit for improving Neural Machine Translation (NMT), when applied to authentic data. But the beneﬁt of using synthetic data in NMT training, produced by the popular back-translation technique, raises the question if data selection could also be useful for synthetic data? In this work we use Infrequent n-gram Recovery (INR) and Feature Decay Algorithms (FDA), two transductive data selection methods to obtain subsets of sentences from synthetic data. These methods ensure that selected sentences share n-grams with the test set so the NMT model can be adapted to translate it. Performing data selection on back-translated data creates new challenges as the source-side may contain noise originated by the model used in the back-translation. Hence, ﬁnding ngrams present in the test set become more diﬃcult. Despite that, in our work we show that adapting a model with a selection of synthetic data is an useful approach

arXiv.org e-Print Archive

Irish Universities

DCU Online Research Access Service

Survey of Low-Resource Machine Translation

Author: Bawden Rachel
Birch Alexandra
Haddow Barry
Helcl Jindřich
Miceli Barone Antonio Valerio
Publication venue: Massachusetts Institute of Technology Press (MIT Press)
Publication date: 01/01/2022
Field of study

International audienceWe present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT

INRIA a CCSD electronic archive server