12,944 research outputs found
Spherical Vision Transformer for 360-degree Video Saliency Prediction
The growing interest in omnidirectional videos (ODVs) that capture the full
field-of-view (FOV) has gained 360-degree saliency prediction importance in
computer vision. However, predicting where humans look in 360-degree scenes
presents unique challenges, including spherical distortion, high resolution,
and limited labelled data. We propose a novel vision-transformer-based model
for omnidirectional videos named SalViT360 that leverages tangent image
representations. We introduce a spherical geometry-aware spatiotemporal
self-attention mechanism that is capable of effective omnidirectional video
understanding. Furthermore, we present a consistency-based unsupervised
regularization term for projection-based 360-degree dense-prediction models to
reduce artefacts in the predictions that occur after inverse projection. Our
approach is the first to employ tangent images for omnidirectional saliency
prediction, and our experimental results on three ODV saliency datasets
demonstrate its effectiveness compared to the state-of-the-art.Comment: 12 pages, 4 figures, accepted to BMVC 202
Reproducible Evaluation of Pan-Tilt-Zoom Tracking
Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in
computer vision for many years. However, it is very difficult to assess the
progress that has been made on this topic because there is no standard
evaluation methodology. The difficulty in evaluating PTZ tracking algorithms
arises from their dynamic nature. In contrast to other forms of tracking, PTZ
tracking involves both locating the target in the image and controlling the
motors of the camera to aim it so that the target stays in its field of view.
This type of tracking can only be performed online. In this paper, we propose a
new evaluation framework based on a virtual PTZ camera. With this framework,
tracking scenarios do not change for each experiment and we are able to
replicate online PTZ camera control and behavior including camera positioning
delays, tracker processing delays, and numerical zoom. We tested our evaluation
framework with the Camshift tracker to show its viability and to establish
baseline results.Comment: This is an extended version of the 2015 ICIP paper "Reproducible
Evaluation of Pan-Tilt-Zoom Tracking
- …