2 research outputs found
Semi-supervised Active Learning for Video Action Detection
In this work, we focus on label efficient learning for video action
detection. We develop a novel semi-supervised active learning approach which
utilizes both labeled as well as unlabeled data along with informative sample
selection for action detection. Video action detection requires spatio-temporal
localization along with classification, which poses several challenges for both
active learning informative sample selection as well as semi-supervised
learning pseudo label generation. First, we propose NoiseAug, a simple
augmentation strategy which effectively selects informative samples for video
action detection. Next, we propose fft-attention, a novel technique based on
high-pass filtering which enables effective utilization of pseudo label for SSL
in video action detection by emphasizing on relevant activity region within a
video. We evaluate the proposed approach on three different benchmark datasets,
UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness
on video action detection where the proposed approach outperforms prior works
in semi-supervised and weakly-supervised learning along with several baseline
approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness
on Youtube-VOS for video object segmentation demonstrating its generalization
capability for other dense prediction tasks in videos. The code and models is
publicly available at:
\url{https://github.com/AKASH2907/semi-sup-active-learning}.Comment: AAAI Conference on Artificial Intelligence, Main Technical Track
(AAAI), 2024, Code: https://github.com/AKASH2907/semi-sup-active-learnin