131 research outputs found
Trained MT Metrics Learn to Cope with Machine-translated References
Neural metrics trained on human evaluations of MT tend to correlate well with
human judgments, but their behavior is not fully understood. In this paper, we
perform a controlled experiment and compare a baseline metric that has not been
trained on human evaluations (Prism) to a trained version of the same metric
(Prism+FT). Surprisingly, we find that Prism+FT becomes more robust to
machine-translated references, which are a notorious problem in MT evaluation.
This suggests that the effects of metric training go beyond the intended effect
of improving overall correlation with human judgments.Comment: WMT 202
Experiments in morphosyntactic processing for translating to and from German
We describe two shared task systems and associated experiments. The German to English system used reordering rules ap-plied to parses and morphological split-ting and stemming. The English to Ger-man system used an additional translation step which recreated compound words and generated morphological inflection
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Trained MT Metrics Learn to Cope with Machine-translated References
Neural metrics trained on human evaluations of MT tend to correlate well with human judgments, but their behavior is not fully understood. In this paper, we perform a controlled experiment and compare a baseline metric that has not been trained on human evaluations (Prism) to a trained version of the same metric (Prism+FT). Surprisingly, we find that Prism+FT becomes more robust to machine-translated references, which are a notorious problem in MT evaluation. This suggests that the effects of metric training go beyond the intended effect of improving overall correlation with human judgments
Learning distributional token representations from visual features
In this study, we compare token representations constructed from visual features
(i.e., pixels) with standard lookup-based
embeddings. Our goal is to gain insight
about the challenges of encoding a text
representation from low-level features,
e.g. from characters or pixels. We focus on Chinese, which—as a logographic
language—has properties that make a representation via visual features challenging
and interesting. To train and evaluate different models for the token representation,
we chose the task of character-based neural machine translation (NMT) from Chinese to English. We found that a token
representation computed only from visual
features can achieve competitive results to
lookup embeddings. However, we also
show different strengths and weaknesses
in the models’ performance in a part-of-
speech tagging task and also a semantic
similarity task. In summary, we show that
it is possible to achieve a
text representation
only from pixels. We hope that this
is a useful stepping stone for future studies that exclusively rely on visual input, or
aim at exploiting visual features of written language
The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
This paper describes the UPC participation in
the WMT 12 evaluation campaign. All sys-
tems presented are based on standard phrase-
based Moses systems. Variations adopted sev-
eral improvement techniques such as mor-
phology simplification and generation and do-
main adaptation. The morphology simpli-
fication overcomes the data sparsity prob-
lem when translating into morphologically-
rich languages such as Spanish by translat-
ing first to a morphology-simplified language
and secondly leave the morphology gener-
ation to an independent classification task.
The domain adaptation approach improves the
SMT system by adding new translation units
learned from MT-output and reference align-
ment. Results depict an improvement on TER,
METEOR, NIST and BLEU scores compared
to our baseline system, obtaining on the of-
ficial test set more benefits from the domain
adaptation approach than from the morpho-
logical generalization method.Peer ReviewedPostprint (published version
Linguistic Structure in Statistical Machine Translation
This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation
Robust Machine Translation Evaluation with Entailment Features
Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap. We compare this metric against a combination metric of four state-of-theart scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements.
- …