The objective of this paper is an efficient training method for video tasks.
We make three contributions: (1) We propose Turbo training, a simple and
versatile training paradigm for Transformers on multiple video tasks. (2) We
illustrate the advantages of Turbo training on action classification,
video-language representation learning, and long-video activity classification,
showing that Turbo training can largely maintain competitive performance while
achieving almost 4X speed-up and significantly less memory consumption. (3)
Turbo training enables long-schedule video-language training and end-to-end
long-video training, delivering competitive or superior performance than
previous works, which were infeasible to train under limited resources.Comment: BMVC202