96,892 research outputs found
Experiments on domain adaptation for English-Hindi SMT
Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which again results in a low translation quality. In this paper, (i) we explore domain-adaptation techniques to combine large out-of-domain training data with small-scale in-domain training data for English—Hindi statistical machine translation and (ii) we cluster large out-of-domain training data to extract sentences similar to in-domain sentences and apply adaptation techniques to combine clustered sub-corpora
with in-domain training data into a unified framework, achieving a 0.44 absolute corresponding to a 4.03% relative improvement in terms of BLEU over the baseline
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
We introduce a Multi-modal Neural Machine Translation model in which a
doubly-attentive decoder naturally incorporates spatial visual features
obtained using pre-trained convolutional neural networks, bridging the gap
between image description and translation. Our decoder learns to attend to
source-language words and parts of an image independently by means of two
separate attention mechanisms as it generates words in the target language. We
find that our model can efficiently exploit not just back-translated in-domain
multi-modal data but also large general-domain text-only MT corpora. We also
report state-of-the-art results on the Multi30k data set.Comment: 8 pages (11 including references), 2 figure
Experiments on domain adaptation for patent machine translation in the PLuTO project
The PLUTO1 project (Patent Language Translations Online) aims to provide a rapid solution for the online retrieval and translation of patent documents through the integration of a number of existing state-of-the-art components provided by the project partners. The paper presents some of the experiments on patent domain adaptation of the Machine Translation (MT) systems used in the PLuTO project. The experiments use the International Patent Classification for domain adaptation and are focused on the English–French language pair
MATREX: the DCU MT system for WMT 2010
This paper describes the DCU machine translation system in the evaluation campaign of the Joint Fifth Workshop on Statistical Machine Translation and Metrics in ACL-2010. We describe the modular design of our multi-engine machine translation (MT) system with particular focus on the components used in this participation.
We participated in the English–Spanish and English–Czech translation tasks, in which we employed our multiengine
architecture to translate. We also participated in the system combination task which was carried out by the MBR
decoder and confusion network decoder
Dynamic topic adaptation for improved contextual modelling in statistical machine translation
In recent years there has been an increased interest in domain adaptation techniques
for statistical machine translation (SMT) to deal with the growing amount of data from
different sources. Topic modelling techniques applied to SMT are closely related to the
field of domain adaptation but more flexible in dealing with unstructured text. Topic
models can capture latent structure in texts and are therefore particularly suitable for
modelling structure in between and beyond corpus boundaries, which are often arbitrary.
In this thesis, the main focus is on dynamic translation model adaptation to texts of
unknown origin, which is a typical scenario for an online MT engine translating web
documents. We introduce a new bilingual topic model for SMT that takes the entire
document context into account and for the first time directly estimates topic-dependent
phrase translation probabilities in a Bayesian fashion. We demonstrate our model’s
ability to improve over several domain adaptation baselines and further provide evidence
for the advantages of bilingual topic modelling for SMT over the more common
monolingual topic modelling. We also show improved performance when deriving further
adapted translation features from the same model which measure different aspects
of topical relatedness.
We introduce another new topic model for SMT which exploits the distributional
nature of phrase pair meaning by modelling topic distributions over phrase pairs using
their distributional profiles. Using this model, we explore combinations of local and
global contextual information and demonstrate the usefulness of different levels of contextual
information, which had not been previously examined for SMT. We also show
that combining this model with a topic model trained at the document-level further improves
performance. Our dynamic topic adaptation approach performs competitively
in comparison with two supervised domain-adapted systems.
Finally, we shed light on the relationship between domain adaptation and topic
adaptation and propose to combine multi-domain adaptation and topic adaptation in a
framework that entails automatic prediction of domain labels at the document level.
We show that while each technique provides complementary benefits to the overall
performance, there is an amount of overlap between domain and topic adaptation. This
can be exploited to build systems that require less adaptation effort at runtime
Towards using web-crawled data for domain adaptation in statistical machine translation
This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-specific data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language
pairs: English–French and English–Greek
- …