Evaluating the quality of generated text is a challenging task in natural
language processing. This difficulty arises from the inherent complexity and
diversity of text. Recently, OpenAI's ChatGPT, a powerful large language model
(LLM), has garnered significant attention due to its impressive performance in
various tasks. Therefore, we present this report to investigate the
effectiveness of LLMs, especially ChatGPT, and explore ways to optimize their
use in assessing text quality. We compared three kinds of reference-free
evaluation methods based on ChatGPT or similar LLMs. The experimental results
prove that ChatGPT is capable to evaluate text quality effectively from various
perspectives without reference and demonstrates superior performance than most
existing automatic metrics. In particular, the Explicit Score, which utilizes
ChatGPT to generate a numeric score measuring text quality, is the most
effective and reliable method among the three exploited approaches. However,
directly comparing the quality of two texts using ChatGPT may lead to
suboptimal results. We hope this report will provide valuable insights into
selecting appropriate methods for evaluating text quality with LLMs such as
ChatGPT.Comment: Technical Report, 13 page