14 research outputs found
SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task
This paper presents the description of 12
systems submitted to the WMT16 IT-task,
covering six different languages, namely
Basque, Bulgarian, Dutch, Czech, Portuguese
and Spanish. All these systems
were developed under the scope of the
QTLeap project, presenting a common
strategy. For each language two different
systems were submitted, namely a phrase-based
MT system built using Moses, and
a system exploiting deep language engineering
approaches, that in all the languages
but Bulgarian was implemented
using TectoMT. For 4 of the 6 languages,
the TectoMT-based system performs better
than the Moses-based one
Dictionary-based Domain Adaptation of MT Systems without Retraining
We describe our submission to the IT-domain translation task of WMT 2016.
We perform domain adaptation with dictionary data on already trained MT systems with no further retraining.
We apply our approach to two conceptually different systems developed within the QTLeap project: TectoMT and Moses, as well as Chimera, their combination.
In all settings, our method improves the translation quality.
Moreover, the basic variant of our approach is applicable to any MT system, including a black-box one
Findings of the 2016 Conference on Machine Translation (WMT16)
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments)
Findings of the 2016 Conference on Machine Translation.
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments).
The quality estimation task had three subtasks,
with a total of 14 teams, submitting
39 entries. The automatic post-editing task
had a total of 6 teams, submitting 11 entries
Findings of the 2017 Conference on Machine Translation
This paper presents the results of the
WMT17 shared tasks, which included
three machine translation (MT) tasks
(news, biomedical, and multimodal), two
evaluation tasks (metrics and run-time estimation
of MT quality), an automatic
post-editing task, a neural MT training
task, and a bandit learning task
Automated Translation with Interlingual Word Representations
In dit proefschrift onderzoeken we het gebruik vertaalsystemen die gebruiken maken van een transferfase met interlinguale representaties van woorden. Op deze manier benaderen we het probleem van de lexicale ambiguïteit in de automatische vertaalsystemen als twee afzonderlijke taken: het bepalen van woordbetekenis en lexicale selectie. Eerst worden de woorden in de brontaal op basis van hun betekenis gedesambigueerd, resulterend in interlinguale representaties van woorden. Vervolgens wordt een lexicale selectiemodule gebruikt die het meest geschikte woord in de doeltaal selecteert. We geven een gedetailleerde beschrijving van de ontwikkeling en evaluatie van vertaalsystemen voor Nederlands-Engels. Dit biedt een achtergrond voor de experimenten in het tweede en derde deel van dit proefschrift. Daarna beschrijven we een methode die de betekenis van woorden bepaalt. Deze is vergelijkbaar met het klassieke Lesk-algoritme, omdat het gebruik maakt van het idee dat gedeelde woorden tussen de context van een woord en zijn definitie informatie over de betekenis ervan verschaffen. Wij gebruiken echter, in plaats daarvan, woord- en betekenisvectoren om de overeenkomst te berekenen tussen de definitie van een betekenis en de context van een woord. We gebruiken onze methode bovendien voor het localiseren en -interpreteren van woordgrapjes.Ten slotte presenteren we een model voor lexicale keuze dat lemma's selecteert, gegeven de abstracte representaties van woorden. Dit doen we door de grammaticale bomen te converteren naar hidden Markov bomen. Op deze manier kan de optimale combinatie van lemmas en hun context berekend worden
Findings of the 2017 Conference on Machine Translation (WMT17)
This paper presents the results of theWMT17 shared tasks, which included three machine translation (MT) tasks(news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task
Itzulpen automatiko gainbegiratu gabea
192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research