LXMERT Model Compression for Visual Question Answering

Eetemadi, Sauleh; Hashemi, Maryam; Kodeiri, Sara; Mahmoudi, Ghazaleh; Sheikhi, Hadi

LXMERT Model Compression for Visual Question Answering

Authors: Sauleh Eetemadi
Maryam Hashemi
Sara Kodeiri
Ghazaleh Mahmoudi
Hadi Sheikhi
Publication date: 23 October 2023
Publisher

Abstract

Large-scale pretrained models such as LXMERT are becoming popular for learning cross-modal representations on text-image pairs for vision-language tasks. According to the lottery ticket hypothesis, NLP and computer vision models contain smaller subnetworks capable of being trained in isolation to full performance. In this paper, we combine these observations to evaluate whether such trainable subnetworks exist in LXMERT when fine-tuned on the VQA task. In addition, we perform a model size cost-benefit analysis by investigating how much pruning can be done without significant loss in accuracy. Our experiment results demonstrate that LXMERT can be effectively pruned by 40%-60% in size with 3% loss in accuracy.Comment: To appear in The Fourth Annual West Coast NLP (WeCNLP) Summi

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2310.15325

Last time updated on 16/01/2024