Evaluating NLG Evaluation Metrics: A Measurement Theory Perspective

Lai, Vivian; Liao, Q. Vera; Xiao, Ziang; Zhang, Susu

Evaluating NLG Evaluation Metrics: A Measurement Theory Perspective

Authors: Vivian Lai
Q. Vera Liao
Ziang Xiao
Susu Zhang
Publication date: 24 May 2023
Publisher

Abstract

We address the fundamental challenge in Natural Language Generation (NLG) model evaluation, the design and validation of evaluation metrics. Recognizing the limitations of existing metrics and issues with human judgment, we propose using measurement theory, the foundation of test design, as a framework for conceptualizing and evaluating the validity and reliability of NLG evaluation metrics. This approach offers a systematic method for defining "good" metrics, developing robust metrics, and assessing metric performance. In this paper, we introduce core concepts in measurement theory in the context of NLG evaluation and key methods to evaluate the performance of NLG metrics. Through this framework, we aim to promote the design, evaluation, and interpretation of valid and reliable metrics, ultimately contributing to the advancement of robust and effective NLG models in real-world settings

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.14889

Last time updated on 26/05/2023