REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning
  Datasets

Cao, Min; Liao, Ning; Qiao, Yu; Xia, Renqiu; Yan, Junchi; Zhang, Bo; Zhang, Shaofeng

REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets

Authors: Min Cao
Ning Liao
Yu Qiao
Renqiu Xia
Junchi Yan
Bo Zhang
Shaofeng Zhang
Publication date: 10 October 2023
Publisher

Abstract

There is an emerging line of research on multimodal instruction tuning, and a line of benchmarks have been proposed for evaluating these models recently. Instead of evaluating the models directly, in this paper we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets themselves and further seek the way of building a dataset for developing an all-powerful VLIT model, which we believe could also be of utility for establishing a grounded protocol for benchmarking VLIT models. For effective analysis of VLIT datasets that remains an open question, we propose a tune-cross-evaluation paradigm: tuning on one dataset and evaluating on the others in turn. For each single tune-evaluation experiment set, we define the Meta Quality (MQ) as the mean score measured by a series of caption metrics including BLEU, METEOR, and ROUGE-L to quantify the quality of a certain dataset or a sample. On this basis, to evaluate the comprehensiveness of a dataset, we develop the Dataset Quality (DQ) covering all tune-evaluation sets. To lay the foundation for building a comprehensive dataset and developing an all-powerful model for practical applications, we further define the Sample Quality (SQ) to quantify the all-sided quality of each sample. Extensive experiments validate the rationality of the proposed evaluation paradigm. Based on the holistic evaluation, we build a new dataset, REVO-LION (REfining VisiOn-Language InstructiOn tuNing), by collecting samples with higher SQ from each dataset. With only half of the full data, the model trained on REVO-LION can achieve performance comparable to simply adding all VLIT datasets up. In addition to developing an all-powerful model, REVO-LION also includes an evaluation set, which is expected to serve as a convenient evaluation benchmark for future research

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.06594

Last time updated on 14/12/2023