2 research outputs found
Sentence-level quality estimation by predicting HTER as a multi-component metric
This submission investigates alternative machine learning models for
predicting the HTER score on the sentence level. Instead of directly predicting
the HTER score, we suggest a model that jointly predicts the amount of the 4
distinct post-editing operations, which are then used to calculate the HTER
score. This also gives the possibility to correct invalid (e.g. negative)
predicted values prior to the calculation of the HTER score. Without any
feature exploration, a multi-layer perceptron with 4 outputs yields small but
significant improvements over the baseline.Comment: Preview for the Quality Estimation Shared Task Description Paper for
the 2nd Conference of Machine Translatio
Fine-grained evaluation of Quality Estimation for Machine translation based on a linguistically-motivated Test Suite
We present an alternative method of evaluating Quality Estimation systems,
which is based on a linguistically-motivated Test Suite. We create a test-set
consisting of 14 linguistic error categories and we gather for each of them a
set of samples with both correct and erroneous translations. Then, we measure
the performance of 5 Quality Estimation systems by checking their ability to
distinguish between the correct and the erroneous translations. The detailed
results are much more informative about the ability of each system. The fact
that different Quality Estimation systems perform differently at various
phenomena confirms the usefulness of the Test Suite