The short-form videos have explosive popularity and have dominated the new
social media trends. Prevailing short-video platforms,~\textit{e.g.}, Kuaishou
(Kwai), TikTok, Instagram Reels, and YouTube Shorts, have changed the way we
consume and create content. For video content creation and understanding, the
shot boundary detection (SBD) is one of the most essential components in
various scenarios. In this work, we release a new public Short video sHot
bOundary deTection dataset, named SHOT, consisting of 853 complete short videos
and 11,606 shot annotations, with 2,716 high quality shot boundary annotations
in 200 test videos. Leveraging this new data wealth, we propose to optimize the
model design for video SBD, by conducting neural architecture search in a
search space encapsulating various advanced 3D ConvNets and Transformers. Our
proposed approach, named AutoShot, achieves higher F1 scores than previous
state-of-the-art approaches, e.g., outperforming TransNetV2 by 4.2%, when being
derived and evaluated on our newly constructed SHOT dataset. Moreover, to
validate the generalizability of the AutoShot architecture, we directly
evaluate it on another three public datasets: ClipShots, BBC and RAI, and the
F1 scores of AutoShot outperform previous state-of-the-art approaches by 1.1%,
0.9% and 1.2%, respectively. The SHOT dataset and code can be found in
https://github.com/wentaozhu/AutoShot.git .Comment: 10 pages, 5 figures, 3 tables, in CVPR 2023; Top-1 solution for scene
/ shot boundary detection
https://paperswithcode.com/paper/autoshot-a-short-video-dataset-and-state-o