268 research outputs found
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
In this paper we compare and contrast
two approaches to Machine Translation
(MT): the CMU-UKA Syntax Augmented
Machine Translation system (SAMT) and
UPC-TALP N-gram-based Statistical Machine
Translation (SMT). SAMT is a hierarchical
syntax-driven translation system
underlain by a phrase-based model and a
target part parse tree. In N-gram-based
SMT, the translation process is based on
bilingual units related to word-to-word
alignment and statistical modeling of the
bilingual context following a maximumentropy
framework. We provide a stepby-
step comparison of the systems and report
results in terms of automatic evaluation
metrics and required computational
resources for a smaller Arabic-to-English
translation task (1.5M tokens in the training
corpus). Human error analysis clarifies
advantages and disadvantages of the
systems under consideration. Finally, we
combine the output of both systems to
yield significant improvements in translation
quality.Postprint (published version
Exploiting alignment techniques in MATREX: the DCU machine translation system for IWSLT 2008
In this paper, we give a description of the machine translation (MT) system developed at DCU that was used for our third participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2008). In this participation, we focus on various techniques for word and phrase alignment to improve system quality. Specifically, we try out our word packing and syntax-enhanced word alignment techniques for the Chinese–English task and for the English–Chinese task for the first time. For all translation tasks except Arabic–English, we exploit linguistically motivated bilingual phrase pairs extracted from parallel treebanks. We smooth our translation tables with out-of-domain word translations for the Arabic–English and Chinese–English tasks in order to solve the problem of the high number of out of vocabulary items. We also carried out experiments combining both in-domain and out-of-domain data to improve system performance and, finally, we deploy a majority voting procedure combining a language model based method and a translation-based method for case and punctuation restoration. We participated in all the translation
tasks and translated both the single-best ASR hypotheses and
the correct recognition results. The translation results confirm that our new word and phrase alignment techniques are often helpful in improving translation quality, and the data combination method we proposed can significantly improve system performance
A hybrid system for patent translation
This work presents a HMT system for patent translation. The system exploits the high coverage of SMT and the high precision of an RBMT system based on GF to deal with specific issues of the language.
The translator is specifically developed to
translate patents and it is evaluated in the
English-French language pair. Although
the number of issues tackled by the grammar
are not extremely numerous yet, both manual and automatic evaluations consistently show their preference for the hybrid system in front of the two individual translators.Peer ReviewedPostprint (published version
Experiments on domain adaptation for patent machine translation in the PLuTO project
The PLUTO1 project (Patent Language Translations Online) aims to provide a rapid solution for the online retrieval and translation of patent documents through the integration of a number of existing state-of-the-art components provided by the project partners. The paper presents some of the experiments on patent domain adaptation of the Machine Translation (MT) systems used in the PLuTO project. The experiments use the International Patent Classification for domain adaptation and are focused on the English–French language pair
- …