Search CORE

2 research outputs found

Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT

Author: Ding Liang
Kocmi Tom
Lu Qingyu
Qiu Baopu
Tao Dacheng
Zhang Kanjian
Publication venue
Publication date: 08/10/2023
Field of study

Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks, such as machine translation, text summarization. Recent research (Kocmi and Federmann, 2023) has shown that utilizing ChatGPT for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level but performs poorly at the segment level. To further improve the performance of LLMs on MT quality assessment, we conduct an investigation into several prompting methods, and propose a new prompting method called Error Analysis Prompting (EAPrompt) by combining Chain-of-Thoughts (Wei et al., 2022) and Error Analysis (Lu et al., 2022). Our results on WMT22 indicate that prompting LLMs like ChatGPT with error analysis can generate human-like MT evaluations at both the system and segment level. Additionally, we first discover some limitations of ChatGPT as an MT evaluator, such as changing the order of input may significantly influence the judgment when providing multiple translations in a single query. This work provides a preliminary experience of prompting LLMs as an evaluator to improve the reliability of translation evaluation metrics under the error analysis paradigm

arXiv.org e-Print Archive

Vega-MT: The JD Explore Academy Translation System for WMT22

Author: Ding Liang
He Shwai
Liu Boan
Liu Chuang
Liu Weifeng
Lu Qingyu
Peng Keqin
Qiu Baopu
Tao Dacheng
Zan Changtong
Zhan Yibing
Zhang Zheng
Publication venue
Publication date: 06/05/2023
Field of study

We describe the JD Explore Academy's submission of the WMT 2022 shared general translation task. We participated in all high-resource tracks and one medium-resource track, including Chinese-English, German-English, Czech-English, Russian-English, and Japanese-English. We push the limit of our previous work -- bidirectional training for translation by scaling up two main factors, i.e. language pairs and model sizes, namely the \textbf{Vega-MT} system. As for language pairs, we scale the "bidirectional" up to the "multidirectional" settings, covering all participating languages, to exploit the common knowledge across languages, and transfer them to the downstream bilingual tasks. As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4.7 Billion parameters, to fully enhance the model capacity for our Vega-MT. Also, we adopt the data augmentation strategies, e.g. cycle translation for monolingual data, and bidirectional self-training for bilingual and monolingual data, to comprehensively exploit the bilingual and monolingual data. To adapt our Vega-MT to the general domain test set, generalization tuning is designed. Based on the official automatic scores of constrained systems, in terms of the sacreBLEU shown in Figure-1, we got the 1st place on {Zh-En (33.5), En-Zh (49.7), De-En (33.7), En-De (37.8), Cs-En (54.9), En-Cs (41.4) and En-Ru (32.7)}, 2nd place on {Ru-En (45.1) and Ja-En (25.6)}, and 3rd place on {En-Ja(41.5)}, respectively; W.R.T the COMET, we got the 1st place on {Zh-En (45.1), En-Zh (61.7), De-En (58.0), En-De (63.2), Cs-En (74.7), Ru-En (64.9), En-Ru (69.6) and En-Ja (65.1)}, 2nd place on {En-Cs (95.3) and Ja-En (40.6)}, respectively.Comment: WMT 2022 (Among all constrained systems, Vega-MT won 7 champions, 2 runners-up and 1 third place w.r.t sacreBLEU, and won 8 champions and 2 runners-up w.r.t COMET.

arXiv.org e-Print Archive