60 research outputs found
Improving the objective function in minimum error rate training
In Minimum Error Rate Training (MERT), the parameters of an SMT system are tuned on a certain evaluation metric to improve translation quality. In this paper, we present empirical results in which parameters tuned on one metric (e.g. BLEU) may not lead to optimal scores on the same metric. The score can be improved significantly by tuning on an entirely different metric (e.g. METEOR, by 0.82
BLEU points or 3.38% relative improvement on WMT08 English–French dataset). We analyse the impact of choice of objective function in MERT and further propose three
combination strategies of different metrics to reduce the bias of a single metric, and obtain parameters that receive better scores (0.99 BLEU points or 4.08% relative improvement) on evaluation metrics than those tuned on the
standalone metric itself
Learning labelled dependencies in machine translation evaluation
Recently novel MT evaluation metrics have been presented which go beyond pure string matching, and which correlate
better than other existing metrics with human judgements. Other research in this area has presented machine learning
methods which learn directly from human judgements. In this paper, we present a novel combination of dependency- and
machine learning-based approaches to automatic MT evaluation, and demonstrate greater correlations with human judgement than the existing state-of-the-art methods.
In addition, we examine the extent to which our novel method can be generalised across different tasks and domains
The Borrowers: Researching the cognitive aspects of translation
The paper considers the interdisciplinary interaction of research on the cognitive aspects of translation. Examples of influence from linguistics, psychology, neuroscience, cognitive science, reading and writing research and language technology are given, with examples from specific sub-disciplines within each one. The breadth of borrowing by researchers in cognitive translatology is made apparent, but the minimal influence of cognitive translatology on the respective disciplines themselves is also highlighted. Suggestions for future developments are made, including ways in which the domain of cognitive translatology might exert greater influence on other disciplines
Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories
The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central
issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given
that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into
MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR’s synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics
Towards predicting post-editing productivity
Machine translation (MT) quality is generally measured via automatic metrics, producing scores that have no meaning for translators who are required to post-edit MT output or for project managers who have to plan and budget for transla- tion projects. This paper investigates correlations between two such automatic metrics (general text matcher and translation edit rate) and post-editing productivity. For the purposes of this paper, productivity is measured via processing speed and cognitive measures of effort using eye tracking as a tool. Processing speed, average fixation time and count are found to correlate well with the scores for groups of segments. Segments with high GTM and TER scores require substantially less time and cognitive effort than medium or low-scoring segments. Future research involving score thresholds and confidence estimation is suggested
Experiments in morphosyntactic processing for translating to and from German
We describe two shared task systems and associated experiments. The German to English system used reordering rules ap-plied to parses and morphological split-ting and stemming. The English to Ger-man system used an additional translation step which recreated compound words and generated morphological inflection
- …