Search CORE

7 research outputs found

Findings of the 2014 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Buck Christian
Federmann Christian
Haddow Barry
Koehn Philipp
Leveling Johannes
Monz Christof
Pecina Pavel
Post Matt
Saint-Amand Herve
Soricut Radu
Specia Lucia
Tamchyna Ales
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

Crossref

Edinburgh Research Explorer

Biblio at Institute of Formal and Applied Linguistics

International Migration, Integration and Social Cohesion online publications

Findings of the 2015 Workshop on Statistical Machine Translation

Author: Bojar Ondrej
Chatterjee Rajen
Federmann Christian
Haddow Barry
Hokamp Chris
Huck Matthias
Koehn Philipp
Logacheva Varvara
Monz Christof
Negri Matteo
Post Matt
Scarton Carolina
Specia Lucia
Turchi Marco
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Biblio at Institute of Formal and Applied Linguistics

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Parallel FDA5 for fast deployment of accurate statistical machine translation systems

Author: Bicici Ergun
Liu Qun
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We use parallel FDA5, an efficiently parameterized and optimized parallel implementation of feature decay algorithms for fast deployment of accurate statistical machine translation systems, taking only about half a day for each translation direction. We build Parallel FDA5 Moses SMT systems for all language pairs in the WMT14 translation task and obtain SMT performance close to the top Moses systems with an average of

3.49

BLEU points difference using significantly less resources for training and development

Crossref

Irish Universities

DCU Online Research Access Service

Results of the WMT17 metrics shared task

Author: Bojar Ondřej
Graham Yvette
Kamran Amir
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

This paper presents the results of the WMT17 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT17 news translation task and Neural MT training task. We collected scores of 14 metrics from 8 research groups. In addition to that, we computed scores of 7 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system-level correlation (how well each metric’s scores correlate with WMT17 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in judging the quality of a particular sentence). This year, we build upon two types of manual judgements: direct assessment (DA) and HUME manual semantic judgements

Crossref

Irish Universities

DCU Online Research Access Service

Biblio at Institute of Formal and Applied Linguistics

Simple Recurrent Units for Highly Parallelizable Recurrence

Author: Artzi Yoav
Dai Hui
Lei Tao
Wang Sida I.
Zhang Yu
Publication venue
Publication date: 01/01/2018
Field of study

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model on translation by incorporating SRU into the architecture.Comment: EMNL

arXiv.org e-Print Archive

Crossref

From Word Embeddings to Large Vocabulary Neural Machine Translation

Author: Jean Sébastien
Publication venue
Publication date: 01/04/2015
Field of study

Dans ce mémoire, nous examinons certaines propriétés des représentations distribuées de mots et nous proposons une technique pour élargir le vocabulaire des systèmes de traduction automatique neurale. En premier lieu, nous considérons un problème de résolution d'analogies bien connu et examinons l'effet de poids adaptés à la position, le choix de la fonction de combinaison et l'impact de l'apprentissage supervisé. Nous enchaînons en montrant que des représentations distribuées simples basées sur la traduction peuvent atteindre ou dépasser l'état de l'art sur le test de détection de synonymes TOEFL et sur le récent étalon-or SimLex-999. Finalament, motivé par d'impressionnants résultats obtenus avec des représentations distribuées issues de systèmes de traduction neurale à petit vocabulaire (30 000 mots), nous présentons une approche compatible à l'utilisation de cartes graphiques pour augmenter la taille du vocabulaire par plus d'un ordre de magnitude. Bien qu'originalement développée seulement pour obtenir les représentations distribuées, nous montrons que cette technique fonctionne plutôt bien sur des tâches de traduction, en particulier de l'anglais vers le français (WMT'14).In this thesis, we examine some properties of word embeddings and propose a technique to handle large vocabularies in neural machine translation. We first look at a well-known analogy task and examine the effect of position-dependent weights, the choice of combination function and the impact of supervised learning. We then show that simple embeddings learnt with translational contexts can match or surpass the state of the art on the TOEFL synonym detection task and on the recently introduced SimLex-999 word similarity gold standard. Finally, motivated by impressive results obtained by small-vocabulary (30,000 words) neural machine translation embeddings on some word similarity tasks, we present a GPU-friendly approach to increase the vocabulary size by more than an order of magnitude. Despite originally being developed for obtaining the embeddings only, we show that this technique actually works quite well on actual translation tasks, especially for English to French (WMT'14)

Dépôt Institutionnel Numérique

The DCU-ICTCAS MT system at WMT 2014 on German-English Translation Task

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Crossref