Training Generative Question-Answering on Synthetic Data Obtained from
  an Instruct-tuned Model

Arima, Kosuke; Ishigaki, Tatsuya; Omi, Takahiro; Takahashi, Kosuke

Training Generative Question-Answering on Synthetic Data Obtained from an Instruct-tuned Model

Authors: Kosuke Arima
Tatsuya Ishigaki
Takahiro Omi
Kosuke Takahashi
Publication date: 12 October 2023
Publisher

Abstract

This paper presents a simple and cost-effective method for synthesizing data to train question-answering systems. For training, fine-tuning GPT models is a common practice in resource-rich languages like English, however, it becomes challenging for non-English languages due to the scarcity of sufficient question-answer (QA) pairs. Existing approaches use question and answer generators trained on human-authored QA pairs, which involves substantial human expenses. In contrast, we use an instruct-tuned model to generate QA pairs in a zero-shot or few-shot manner. We conduct experiments to compare various strategies for obtaining QA pairs from the instruct-tuned model. The results demonstrate that a model trained on our proposed synthetic data achieves comparable performance to a model trained on manually curated datasets, without incurring human costs.Comment: PACLIC 2023 short paper, 4 pages (6 pages including references), 4 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.08072

Last time updated on 04/01/2024