8 research outputs found
From Arabic user-generated content to machine translation: integrating automatic error correction
With the wide spread of the social media and online forums,
individual users have been able to actively participate in the generation
of online content in different languages and dialects. Arabic is one of the
fastest growing languages used on Internet, but dialects (like Egyptian
and Saudi Arabian) have a big share of the Arabic online content. There
are many differences between Dialectal Arabic and Modern Standard
Arabic which cause many challenges for Machine Translation of informal
Arabic language. In this paper, we investigate the use of Automatic Error Correction method to improve the quality of Arabic User-Generated
texts and its automatic translation. Our experiments show that the new
system with automatic correction module outperforms the baseline system by nearly 22.59% of relative improvement
LIUM Machine Translation Systems for WMT17 News Translation Task
This paper describes LIUM submissions to WMT17 News Translation Task for
English-German, English-Turkish, English-Czech and English-Latvian language
pairs. We train BPE-based attentive Neural Machine Translation systems with and
without factored outputs using the open source nmtpy framework. Competitive
scores were obtained by ensembling various systems and exploiting the
availability of target monolingual corpora for back-translation. The impact of
back-translation quantity and quality is also analyzed for English-Turkish
where our post-deadline submission surpassed the best entry by +1.6 BLEU.Comment: News Translation Task System Description paper for WMT1
NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems
In this paper, we present nmtpy, a flexible Python toolkit based on Theano
for training Neural Machine Translation and other neural sequence-to-sequence
architectures. nmtpy decouples the specification of a network from the training
and inference utilities to simplify the addition of a new architecture and
reduce the amount of boilerplate code to be written. nmtpy has been used for
LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News
Translation tasks in 2016 and 2017.Comment: 10 pages, 3 figure
Does Multimodality Help Human and Machine for Translation and Image Captioning?
This paper presents the systems developed by LIUM and CVC for the WMT16
Multimodal Machine Translation challenge. We explored various comparative
methods, namely phrase-based systems and attentional recurrent neural networks
models trained using monomodal or multimodal data. We also performed a human
evaluation in order to estimate the usefulness of multimodal data for human
machine translation and image description generation. Our systems obtained the
best results for both tasks according to the automatic evaluation metrics BLEU
and METEOR.Comment: 7 pages, 2 figures, v4: Small clarification in section 4 title and
conten
LIUM-CVC Submissions for WMT17 Multimodal Translation Task
This paper describes the monomodal and multimodal Neural Machine Translation
systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal
Translation. We mainly explored two multimodal architectures where either
global visual features or convolutional feature maps are integrated in order to
benefit from visual context. Our final systems ranked first for both En-De and
En-Fr language pairs according to the automatic evaluation metrics METEOR and
BLEU.Comment: MMT System Description Paper for WMT1
Traduction automatique statistique de la langue arabe
The Arabic language received a lot of attention in the machine translation community during the last decade. It is the official language of 25 countries and it is spoken by more than 380 million people. The interest in Arabic language and its dialects increased more after the Arab spring and the political change in the Arab countries. In this thesis, I worked on improving LIUM's machine translation system for Arabic-English in the frame-work of the BOLT project.In this thesis, I have extend LIUM's phrase-based statistical machine translation system in many ways. Phrase-based systems are considered to be one of the best performing approaches. Basically, two probabilistic models are used, a translation model and a language model.I have been working on improving the translation quality. This is done by focusing on three different aspects. The first aspect is reducing the number of unknown words in the translated output. Second, the entities like numbers or dates that can be translated efficiently by some transfer rules. Finally, I have been working on the transliteration of named entities. The second aspect of my work is the adaptation of the translation model to the domain or genre of the translation task.Finally, I have been working on improved language modeling, based on neural network language models, also called continuous space language models. They are used to rescore the n-best translation hypotheses.All the developed techniques have been thoroughly evaluated and I took part in three international evaluations of the BOLT project.La traduction automatique de texte arabe a reçu beaucoup d'attention au cours de la dernière décennie. La langue arabe, langue officielle de plus de 25 pays, est parlée par plus de 290 millions de personnes. Les changements politiques engendrés par les révolutions arabes ont mis sur le devant de la scène cette langue et ses multiples dialectes. Ce travail s'inscrit dans le cadre du projet BOLT dont le but est d'améliorer les performances des systèmes de traduction arabe-anglais pour des domaines spécifiques (SMS, parole conversationnelle, etc.)Dans cette thèse, j'ai enrichi le système de traduction à base de segments du LIUM à maints égards. Les systèmes à base de segments fournissent actuellement les meilleures performances. Ces systèmes sont basés sur deux modèles statistiques : le modèle de traduction et le modèle de langage. Dans l'objectif d’améliorer la qualité de traduction de l'arabe, nous avons mis l'accent sur trois aspects. Le premier aspect est la réduction des mots inconnus dans la sortie de traduction. Le second aspect de mon travail de thèse est l'adaptation au domaine ou à la tâche de la table de traduction. Finalement, je me suis intéressé à l'amélioration de la modélisation linguistique avec des réseaux de neurones. Ces modèles sont utilisés pour re-évaluer les n-meilleures hypothèses de traduction.Toutes les techniques développées ont été minutieusement incorporées dans le système du LIUM et évaluées dans trois campagnes d’évaluation internationales dans le cadre du projet BOLT
A Multi-Domain Translation Model Framework for Statistical Machine Translation
While domain adaptation techniques for SMT have proven to be effective at improving translation quality, their practicality for a multi-domain environment is often limited because of the computational and human costs of developing and maintaining multiple systems adapted to different domains. We present an architecture that delays the computation of translation model features until decoding, allowing for the application of mixture-modeling techniques at decoding time. We also describe a method for unsupervised adaptation with development and test data from multiple domains. Experimental results on two language pairs demonstrate the effectiveness of both our translation model architecture and automatic clustering, with gains of up to 1 BLEU over unadapted systems and single-domain adaptation.
From Arabic user-generated content to machine translation: integrating automatic error correction
With the wide spread of the social media and online forums,
individual users have been able to actively participate in the generation
of online content in different languages and dialects. Arabic is one of the
fastest growing languages used on Internet, but dialects (like Egyptian
and Saudi Arabian) have a big share of the Arabic online content. There
are many differences between Dialectal Arabic and Modern Standard
Arabic which cause many challenges for Machine Translation of informal
Arabic language. In this paper, we investigate the use of Automatic Error Correction method to improve the quality of Arabic User-Generated
texts and its automatic translation. Our experiments show that the new
system with automatic correction module outperforms the baseline system by nearly 22.59% of relative improvement