624 research outputs found
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Using F-structures in machine translation evaluation
Despite a growing interest in automatic evaluation methods for Machine Translation (MT) quality, most existing automatic metrics are still limited to surface comparison of translation and reference strings. In this paper we
show how Lexical-Functional Grammar (LFG) labelled dependencies obtained from an automatic parse can be used to assess the quality of MT on a deeper linguistic level, giving as a result higher correlations with human judgements
Source side pre-ordering using recurrent neural networks for English-Myanmar machine translation
Word reordering has remained one of the challenging problems for machine translation when translating between language pairs with different word orders e.g. English and Myanmar. Without reordering between these languages, a source sentence may be translated directly with similar word order and translation can not be meaningful. Myanmar is a subject-objectverb (SOV) language and an effective reordering is essential for translation. In this paper, we applied a pre-ordering approach using recurrent neural networks to pre-order words of the source Myanmar sentence into target Englishâs word order. This neural pre-ordering model is automatically derived from parallel word-aligned data with syntactic and lexical features based on dependency parse trees of the source sentences. This can generate arbitrary permutations that may be non-local on the sentence and can be combined into English-Myanmar machine translation. We exploited the model to reorder English sentences into Myanmar-like word order as a preprocessing stage for machine translation, obtaining improvements quality comparable to baseline rule-based pre-ordering approach on asian language treebank (ALT) corpus
Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources
Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen
Syntactic and semantic features for statistical and neural machine translation
Machine Translation (MT) for language pairs with long distance dependencies and
word reordering, such as GermanâEnglish, is prone to producing output that is lexically
or syntactically incoherent. Statistical MT (SMT) models used explicit or latent
syntax to improve reordering, however failed at capturing other long distance dependencies.
This thesis explores how explicit sentence-level syntactic information can improve
translation for such complex linguistic phenomena. In particular, we work at the
level of the syntactic-semantic interface with representations conveying the predicate-argument
structures. These are essential to preserving semantics in translation and
SMT systems have long struggled to model them.
String-to-tree SMT systems use explicit target syntax to handle long-distance reordering,
but make strong independence assumptions which lead to inconsistent lexical
choices. To address this, we propose a Selectional Preferences feature which models
the semantic affinities between target predicates and their argument fillers using the
target dependency relations available in the decoder. We found that our feature is not
effective in a string-to-tree system for GermanâEnglish and that often the conditioning
context is wrong because of mistranslated verbs.
To improve verb translation, we proposed a Neural Verb Lexicon Model (NVLM)
incorporating sentence-level syntactic context from the source which carries relevant
semantic information for verb disambiguation. When used as an extra feature for re-ranking
the output of a Germanâ English string-to-tree system, the NVLM improved
verb translation precision by up to 2.7% and recall by up to 7.4%.
While the NVLM improved some aspects of translation, other syntactic and lexical
inconsistencies are not being addressed by a linear combination of independent models.
In contrast to SMT, neural machine translation (NMT) avoids strong independence
assumptions thus generating more fluent translations and capturing some long-distance
dependencies. Still, incorporating additional linguistic information can improve translation
quality.
We proposed a method for tightly coupling target words and syntax in the NMT
decoder. To represent syntax explicitly, we used CCG supertags, which encode subcategorization
information, capturing long distance dependencies and attachments. Our
method improved translation quality on several difficult linguistic constructs, including
prepositional phrases which are the most frequent type of predicate arguments. These
improvements over a strong baseline NMT system were consistent across two language
pairs: 0.9 BLEU for GermanâEnglish and 1.2 BLEU for RomanianâEnglish
- âŠ