126 research outputs found
SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window
Reference-based metrics that operate at the sentence level typically
outperform quality estimation metrics, which have access only to the source and
system output. This is unsurprising, since references resolve ambiguities that
may be present in the source. We investigate whether additional source context
can effectively substitute for a reference. We present a metric, SLIDE (SLiding
Document Evaluator), which operates on blocks of sentences using a window that
slides over each document in the test set, feeding each chunk into an
unmodified, off-the-shelf quality estimation model. We find that SLIDE obtains
significantly higher pairwise system accuracy than its sentence-level baseline,
in some cases even eliminating the gap with reference-base metrics. This
suggests that source context may provide the same information as a human
reference
- …