Detecting firearms and accurately localizing individuals carrying them in
images or videos is of paramount importance in security, surveillance, and
content customization. However, this task presents significant challenges in
complex environments due to clutter and the diverse shapes of firearms. To
address this problem, we propose a novel approach that leverages human-firearm
interaction information, which provides valuable clues for localizing firearm
carriers. Our approach incorporates an attention mechanism that effectively
distinguishes humans and firearms from the background by focusing on relevant
areas. Additionally, we introduce a saliency-driven locality-preserving
constraint to learn essential features while preserving foreground information
in the input image. By combining these components, our approach achieves
exceptional results on a newly proposed dataset. To handle inputs of varying
sizes, we pass paired human-firearm instances with attention masks as channels
through a deep network for feature computation, utilizing an adaptive average
pooling layer. We extensively evaluate our approach against existing methods in
human-object interaction detection and achieve significant results (AP=77.8\%)
compared to the baseline approach (AP=63.1\%). This demonstrates the
effectiveness of leveraging attention mechanisms and saliency-driven locality
preservation for accurate human-firearm interaction detection. Our findings
contribute to advancing the fields of security and surveillance, enabling more
efficient firearm localization and identification in diverse scenarios.Comment: This paper is accepted in IEEE Transactions on Computational Social
System