Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality
translation has become an increasingly common application of SMT, which henceforth we refer to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally
retrained system that can learn knowledge from human post-editing outputs as early as possible
to augment the SMT models to reduce PE time. In this scenario, the order of input segments
plays a very important role in reducing the overall PE time. Under the active learning-based
(AL) framework, this paper provides an empirical study of several typical segment prioritization methods, namely the cross entropy difference (CED), n-grams, perplexity (PPL) and
translation confidence, and verifies their performance on different data sets and language pairs.
Experiments in a simulated setting show that the confidence of translations performs best with
decreases of 1.72-4.55 points TER absolute on average compared to the sequential PE-based
incrementally retrained SMT