251 research outputs found

    Basque-to-Spanish and Spanish-to-Basque machine translation for the health domain

    Get PDF
    [EU]Master Amaierako Lan honek medikuntza domeinuko euskara eta gaztelera arteko itzulpen automatiko sistema bat garatzeko helburuarekin emandako lehenengo urratsak aurkezten ditu. Corpus elebidun nahikoaren faltan, hainbat esperimentu burutu dira Itzulpen Automatiko Neuronalean erabiltzen diren parametroak domeinuz kanpoko corpusean aztertzeko; medikuntza domeinuan izandako jokaera ebaluatzeko ordea, eskuz itzulitako corpusa erabili da medikuntza domeinuko corpusen presentzia handituz entrenatutako sistema desberdinak probatzeko. Lortutako emaitzek deskribatutako helbururako bidean lehenengo aurrerapausoa suposatzen dute.[EN]This project presents the initial steps towards the objective of developing a Machine Translation system for the health domain between Basque and Spanish. In the absence of a big enough bilingual corpus, several experiments have been carried out to test different Neural Machine Translation parameters on an out-of-domain corpus; while performance on the health domain has been evaluated with a manually translated corpus in different systems trained with increasing presence of health domain corpora. The results obtained represent a first step forward to the described objective

    Learning to translate by learning to communicate

    Full text link
    We formulate and test a technique to use Emergent Communication (EC) with a pretrained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages. It has been argued that the currently dominant paradigm in NLP of pretraining on text-only corpora will not yield robust natural language understanding systems, and the need for grounded, goal-oriented, and interactive language learning has been highlighted. In our approach, we embed a modern multilingual model (mBART, Liu et. al. 2020) into an EC image-reference game, in which the model is incentivized to use multilingual generations to accomplish a vision-grounded task, with the hypothesis that this will align multiple languages to a shared task space. We present two variants of EC Fine-Tuning (Steinert-Threlkeld et. al. 2022), one of which outperforms a backtranslation-based baseline in 6/8 translation settings, and proves especially beneficial for the very low-resource languages of Nepali and Sinhala

    Multilingual Neural Machine Translation: the case-study for Catalan, Spanish and Portuguese Romance Languages

    Get PDF
    La traducció automàtica és la tasca de traduir automàticament unidioma a un altre. Aquest projecte avalua el rendiment dels últims siste-mes d'aprenentatge profund en la tasca de traducció d'idiomes similars.Avaluarem la traducció entre Català, Castella i Portuguès, que són llen-gües romàniques, per veure com l'arquitectura del Transformer realitzala tasca. També farem servir diverses tècniques per millorar la traduccióentre els idiomes. Primer, utilitzarem model multilingües que permetenfer transferència de coneixement entre idiomes i poder fer traduccionszero-shot. Després aplicarem backtranslation per poder fer ús dels textsmonolingües i millorar les traduccions del sistema. Per últim milloraremla traducció de domini específic fent ús de fine tuning.Machine translation is the task of automatically translating one lan-guage into another. This project aims to evaluate the performance ofstate-of-the-art Deep Learning systems on similar language translation.We will to evaluate the translation between Catalan, Spanish, and Por-tuguese, which are Romance languages, and see how the Transformer ar-chitecture performs in this task. We will additionally make use of differenttechniques to improve the translation between these languages. First ofall, we will make use of multilingual models that allow for transfer-learningas well as zero-shot translations. Secondly, we will apply the backtransla-tion technique to make use of the monolingual data and better the systemtranslations. Lastly, we will improve the specific domain data using finetuning

    Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

    Get PDF
    Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both one-to-many and many-to-many settings, and improves zero-shot performance by ~10 BLEU, approaching conventional pivot-based methods.Comment: ACL202

    Findings of the 2019 Conference on Machine Translation (WMT19)

    Get PDF
    This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation

    A human evaluation of English-Irish statistical and neural machine translation

    Get PDF
    With official status in both Ireland and the EU, there is a need for high-quality English-Irish (EN-GA) machine translation (MT) systems which are suitable for use in a professional translation environment. While we have seen recent research on improving both statistical MT and neural MT for the EN-GA pair, the results of such systems have always been reported using automatic evaluation metrics. This paper provides the first human evaluation study of EN-GA MT using professional translators and in-domain (public administration) data for a more accurate depiction of the translation quality available via MT
    corecore