79 research outputs found
Towards predicting post-editing productivity
Machine translation (MT) quality is generally measured via automatic metrics, producing scores that have no meaning for translators who are required to post-edit MT output or for project managers who have to plan and budget for transla- tion projects. This paper investigates correlations between two such automatic metrics (general text matcher and translation edit rate) and post-editing productivity. For the purposes of this paper, productivity is measured via processing speed and cognitive measures of effort using eye tracking as a tool. Processing speed, average fixation time and count are found to correlate well with the scores for groups of segments. Segments with high GTM and TER scores require substantially less time and cognitive effort than medium or low-scoring segments. Future research involving score thresholds and confidence estimation is suggested
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
MATREX: the DCU MT System for WMT 2008
In this paper, we give a description of the machine translation system developed at DCU that was used for our participation in the evaluation campaign of the Third Workshop on Statistical Machine Translation at ACL 2008.
We describe the modular design of our data driven MT system with particular focus on the components used in this participation. We also describe some of the significant modules which were unused in this task. We participated in the EuroParl task for the following translation directions: SpanishâEnglish and FrenchâEnglish, in which we employed
our hybrid EBMT-SMT architecture to translate. We also participated in the CzechâEnglish News and News Commentary tasks which represented a previously untested language
pair for our system. We report results on the provided development and test sets
The TALP-UPC phrase-based translation systems for WMT12: morphology simplification and domain adaptation
This paper describes the UPC participation in
the WMT 12 evaluation campaign. All sys-
tems presented are based on standard phrase-
based Moses systems. Variations adopted sev-
eral improvement techniques such as mor-
phology simplification and generation and do-
main adaptation. The morphology simpli-
fication overcomes the data sparsity prob-
lem when translating into morphologically-
rich languages such as Spanish by translat-
ing first to a morphology-simplified language
and secondly leave the morphology gener-
ation to an independent classification task.
The domain adaptation approach improves the
SMT system by adding new translation units
learned from MT-output and reference align-
ment. Results depict an improvement on TER,
METEOR, NIST and BLEU scores compared
to our baseline system, obtaining on the of-
ficial test set more benefits from the domain
adaptation approach than from the morpho-
logical generalization method.Peer ReviewedPostprint (published version
Deep evaluation of hybrid architectures: simple metrics correlated with human judgments
The process of developing hybrid MT systems
is guided by the evaluation method used to
compare different combinations of basic subsystems.
This work presents a deep evaluation
experiment of a hybrid architecture that
tries to get the best of both worlds, rule-based and statistical. In a first evaluation human assessments were used to compare just the single statistical system and the hybrid one, the rule-based system was not compared by hand because the results of automatic evaluation showed a clear disadvantage. But a second and wider evaluation experiment surprisingly showed that according to human evaluation the best system was the rule-based, the one that achieved the worst results using automatic evaluation. An examination of sentences with controversial results suggested that linguistic well-formedness in the output
should be considered in evaluation. After experimenting with 6 possible metrics we conclude that a simple arithmetic mean of BLEU and BLEU calculated on parts of speech of words is clearly a more human conformant
metric than lexical metrics alone.Peer ReviewedPostprint (authorâs final draft
Analyzing Error Types in English-Czech Machine Translation
This paper examines two techniques of manual evaluation that can be used to identify error
types of individual machine translation systems. The first technique of âblind post-editingâ is
being used in WMT evaluation campaigns since 2009 and manually constructed data of this
type are available for various language pairs. The second technique of explicit marking of errors
has been used in the past as well.
We propose a method for interpreting blind post-editing data at a finer level and compare
the results with explicit marking of errors. While the human annotation of either of the techniques
is not exactly reproducible (relatively low agreement), both techniques lead to similar
observations of differences of the systems. Specifically, we are able to suggest which errors in
MT output are easy and hard to correct with no access to the source, a situation experienced by
users who do not understand the source language
Improving English to Spanish out-of-domain translations by morphology generalization and generation
This paper presents a detailed study of a
method for morphology generalization and
generation to address out-of-domain translations
in English-to-Spanish phrase-based MT.
The paper studies whether the morphological
richness of the target language causes poor
quality translation when translating out-ofdomain.
In detail, this approach first translates
into Spanish simplified forms and then
predicts the final inflected forms through a
morphology generation step based on shallow
and deep-projected linguistic information
available from both the source and targetlanguage
sentences. Obtained results highlight
the importance of generalization, and
therefore generation, for dealing with out-ofdomain
data.Peer ReviewedPostprint (published version
- âŠ