40,531 research outputs found
A detailed analysis of phrase-based and syntax-based machine translation: the search for systematic differences
This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and
English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the
systems generate different output and can potentially
be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging
A Robust and Efficient Three-Layered Dialogue Component for a Speech-to-Speech Translation System
We present the dialogue component of the speech-to-speech translation system
VERBMOBIL. In contrast to conventional dialogue systems it mediates the
dialogue while processing maximally 50% of the dialogue in depth. Special
requirements like robustness and efficiency lead to a 3-layered hybrid
architecture for the dialogue module, using statistics, an automaton and a
planner. A dialogue memory is constructed incrementally.Comment: Postscript file, compressed and uuencoded, 15 pages, to appear in
Proceedings of EACL-95, Dublin
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
We present Grid Beam Search (GBS), an algorithm which extends beam search to
allow the inclusion of pre-specified lexical constraints. The algorithm can be
used with any model that generates a sequence , by maximizing . Lexical
constraints take the form of phrases or words that must be present in the
output sequence. This is a very general way to incorporate additional knowledge
into a model's output without requiring any modification of the model
parameters or training data. We demonstrate the feasibility and flexibility of
Lexically Constrained Decoding by conducting experiments on Neural
Interactive-Predictive Translation, as well as Domain Adaptation for Neural
Machine Translation. Experiments show that GBS can provide large improvements
in translation quality in interactive scenarios, and that, even without any
user input, GBS can be used to achieve significant gains in performance in
domain adaptation scenarios.Comment: Accepted as a long paper at ACL 201
Exploring different representational units in English-to-Turkish statistical machine translation
We investigate different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data, (ii) augmenting the training data with “sentences” comprising only the content words of the original training data to bias root word alignment, (iii) reranking
the n-best morpheme-sequence outputs of the decoder with a word-based language
model, and (iv) using model iteration all provide a non-trivial improvement over
a fully word-based baseline. Despite our very limited training data, we improve from 20.22 BLEU points for our simplest model to 25.08 BLEU points for an improvement of 4.86 points or 24% relative
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
Neural Machine Translation (NMT) has been widely used in recent years with
significant improvements for many language pairs. Although state-of-the-art NMT
systems are generating progressively better translations, idiom translation
remains one of the open challenges in this field. Idioms, a category of
multiword expressions, are an interesting language phenomenon where the overall
meaning of the expression cannot be composed from the meanings of its parts. A
first important challenge is the lack of dedicated data sets for learning and
evaluating idiom translation. In this paper we address this problem by creating
the first large-scale data set for idiom translation. Our data set is
automatically extracted from a widely used German-English translation corpus
and includes, for each language direction, a targeted evaluation set where all
sentences contain idioms and a regular training corpus where sentences
including idioms are marked. We release this data set and use it to perform
preliminary NMT experiments as the first step towards better idiom translation.Comment: Accepted at LREC 201
Findings of the 2019 Conference on Machine Translation (WMT19)
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation
Target-Side Context for Discriminative Models in Statistical Machine Translation
Discriminative translation models utilizing source context have been shown to
help statistical machine translation performance. We propose a novel extension
of this work using target context information. Surprisingly, we show that this
model can be efficiently integrated directly in the decoding process. Our
approach scales to large training data sizes and results in consistent
improvements in translation quality on four language pairs. We also provide an
analysis comparing the strengths of the baseline source-context model with our
extended source-context and target-context model and we show that our extension
allows us to better capture morphological coherence. Our work is freely
available as part of Moses.Comment: Accepted as a long paper for ACL 201
Combining semantic and syntactic generalization in example-based machine translation
In this paper, we report our experiments in combining two EBMT systems that rely on generalized templates, Marclator and CMU-EBMT, on an English–German translation task. Our goal was to see whether a statistically significant improvement could be achieved over the individual performances of these two systems. We observed that this was not the case. However, our system consistently outperformed a lexical EBMT baseline system
- …