Unleash the Potential of CLIP for Video Highlight Detection

Han, Donghoon; Kwak, Nojun; Nam, Seong-Uk; Park, Eunhwan; Seo, Seunghyeon

Unleash the Potential of CLIP for Video Highlight Detection

Authors: Donghoon Han
Nojun Kwak
Seong-Uk Nam
Eunhwan Park
Seunghyeon Seo
Publication date: 2 April 2024
Publisher

Abstract

Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2404.01745

Last time updated on 22/10/2024