The Lottery Ticket Hypothesis for Vision Transformers

Dong, Peiyan; Kong, Zhenglun; Ma, Xiaolong; Meng, Xin; Qin, Minghai; Shen, Xuan; Tang, Hao; Wang, Yanzhi; Yuan, Geng

The Lottery Ticket Hypothesis for Vision Transformers

Authors: Peiyan Dong
Zhenglun Kong
Xiaolong Ma
Xin Meng
Minghai Qin
Xuan Shen
Hao Tang
Yanzhi Wang
Geng Yuan
Publication date: 2 November 2022
Publisher

Abstract

The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input images consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input. Furthermore, we present a simple yet effective method to find the winning tickets in input patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. More specifically, we use a ticket selector to generate the winning tickets based on the informativeness of patches. Meanwhile, we build another randomly selected subset of patches for comparison, and the experiments show that there is clear difference between the performance of models trained with winning tickets and randomly selected subsets

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.01484

Last time updated on 08/12/2022