An empirical study of segment prioritization for

incrementally retrained post-editing-based SMT

Ankit, Srivastava; Du, Jinhua; Lewis, David; Maldonado Guerra, Alfredo; Way, Andy

research

An empirical study of segment prioritization for incrementally retrained post-editing-based SMT

Authors: Srivastava Ankit
Jinhua Du
David Lewis
Alfredo Maldonado Guerra
Andy Way
Publication date: 1 November 2015
Publisher: 'Association for Computational Linguistics (ACL)'

Abstract

Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality translation has become an increasingly common application of SMT, which henceforth we refer to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally retrained system that can learn knowledge from human post-editing outputs as early as possible to augment the SMT models to reduce PE time. In this scenario, the order of input segments plays a very important role in reducing the overall PE time. Under the active learning-based (AL) framework, this paper provides an empirical study of several typical segment prioritization methods, namely the cross entropy difference (CED), n-grams, perplexity (PPL) and translation confidence, and verifies their performance on different data sets and language pairs. Experiments in a simulated setting show that the confidence of translations performs best with decreases of 1.72-4.55 points TER absolute on average compared to the sequential PE-based incrementally retrained SMT

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Irish Universities

Last time updated on 20/05/2019

DCU Online Research Access Service

oai:doras.dcu.ie:23216

Last time updated on 09/05/2019