20,062 research outputs found
Experiments on domain adaptation for English-Hindi SMT
Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for English—Hindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora
with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline
Domain adaptation strategies in statistical machine translation: a brief overview
© Cambridge University Press, 2015.Statistical machine translation (SMT) is gaining interest given that it can easily be adapted to any pair of languages. One of the main challenges in SMT is domain adaptation because the performance in translation drops when testing conditions deviate from training conditions. Many research works are arising to face this challenge. Research is focused on trying to exploit all kinds of material, if available. This paper provides an overview of research, which copes with the domain adaptation challenge in SMT.Peer ReviewedPostprint (author's final draft
Domain Control for Neural Machine Translation
Machine translation systems are very sensitive to the domains they were
trained on. Several domain adaptation techniques have been deeply studied. We
propose a new technique for neural machine translation (NMT) that we call
domain control which is performed at runtime using a unique neural network
covering multiple domains. The presented approach shows quality improvements
when compared to dedicated domains translating on any of the covered domains
and even on out-of-domain data. In addition, model parameters do not need to be
re-estimated for each domain, making this effective to real use cases.
Evaluation is carried out on English-to-French translation for two different
testing scenarios. We first consider the case where an end-user performs
translations on a known domain. Secondly, we consider the scenario where the
domain is not known and predicted at the sentence level before translating.
Results show consistent accuracy improvements for both conditions.Comment: Published in RANLP 201
Domain adaptation for statistical machine translation of corporate and user-generated content
The growing popularity of Statistical Machine Translation (SMT) techniques in recent years has led to the development of multiple domain-specic resources and adaptation scenarios. In this thesis we address two important and industrially relevant adaptation scenarios, each suited to different kinds of content.
Initially focussing on professionally edited `enterprise-quality' corporate content, we address a specic scenario of data translation from a mixture of different domains where, for each of them domain-specific data is available. We utilise an automatic classifier to combine multiple domain-specific models and empirically show that such a configuration results in better translation quality compared to both traditional and state-of-the-art techniques for handling mixed domain translation.
In the second phase of our research we shift our focus to the translation of possibly `noisy' user-generated content in web-forums created around products and services of a multinational company. Using professionally edited translation memory (TM) data for training, we use different normalisation and data selection techniques to adapt SMT models to noisy forum content. In this scenario, we also study the effect of mixture adaptation using a combination of in-domain and out-of-domain data at different component levels of an SMT system. Finally we focus on the task of optimal supplementary training data selection from out-of-domain corpora using a novel incremental model merging mechanism to adapt TM-based models to improve forum-content translation quality
Domain adaptation problem in statistical machine translation systems
Globalization suddenly brings many people from different country to interact with each other, requiring them to be able to speak several languages. Human translators are slow and expensive, we find the necessity of developing machine
translators to automatize the task. Several approaches of Machine translation have
been develop by the researchers. In this work, we use the Statistical Machine Translation approach. Statistical Machine Translation systems perform poorly when applied on new domains. The domain adaptation problem has recently gained interest in Statistical Machine Translation. The basic idea is to improve the performance of
the system trained and tuned with different domain than the one to be translated.
This article studies different paradigms of domain adaptation. The results report improvements compared with a system trained only with in-domain data and trained with all the available data.Chinea Ríos, M.; Sanchis Trilles, G.; Casacuberta Nolla, F. (2015). Domain adaptation problem in statistical machine translation systems. En Artificial Intelligence Research and Development. IOS Press. 205-213. doi:10.3233/978-1-61499-578-4-205S20521
- …