3 research outputs found
Efficient Human Vision Inspired Action Recognition using Adaptive Spatiotemporal Sampling
Adaptive sampling that exploits the spatiotemporal redundancy in videos is
critical for always-on action recognition on wearable devices with limited
computing and battery resources. The commonly used fixed sampling strategy is
not context-aware and may under-sample the visual content, and thus adversely
impacts both computation efficiency and accuracy. Inspired by the concepts of
foveal vision and pre-attentive processing from the human visual perception
mechanism, we introduce a novel adaptive spatiotemporal sampling scheme for
efficient action recognition. Our system pre-scans the global scene context at
low-resolution and decides to skip or request high-resolution features at
salient regions for further processing. We validate the system on EPIC-KITCHENS
and UCF-101 datasets for action recognition, and show that our proposed
approach can greatly speed up inference with a tolerable loss of accuracy
compared with those from state-of-the-art baselines. Source code is available
in https://github.com/knmac/adaptive_spatiotemporal