1 research outputs found
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
We propose the use of a sequence-to-sequence paraphraser for automatic
machine translation evaluation. The paraphraser takes a human reference as
input and then force-decodes and scores an MT system output. We propose
training the aforementioned paraphraser as a multilingual NMT system, treating
paraphrasing as a zero-shot "language pair" (e.g., Russian to Russian). We
denote our paraphraser "unbiased" because the mode of our model's output
probability is centered around a copy of the input sequence, which in our case
represent the best case scenario where the MT system output matches a human
reference. Our method is simple and intuitive, and our single model (trained in
39 languages) outperforms or statistically ties with all prior metrics on the
WMT19 segment-level shared metrics task in all languages, excluding Gujarati
where the model had no training data. We also explore using our model
conditioned on the source instead of the reference, and find that it
outperforms every quality estimation as a metric system from the WMT19 shared
task on quality estimation by a statistically significant margin in every
language pair