74 research outputs found
Spherical Vision Transformer for 360-degree Video Saliency Prediction
The growing interest in omnidirectional videos (ODVs) that capture the full
field-of-view (FOV) has gained 360-degree saliency prediction importance in
computer vision. However, predicting where humans look in 360-degree scenes
presents unique challenges, including spherical distortion, high resolution,
and limited labelled data. We propose a novel vision-transformer-based model
for omnidirectional videos named SalViT360 that leverages tangent image
representations. We introduce a spherical geometry-aware spatiotemporal
self-attention mechanism that is capable of effective omnidirectional video
understanding. Furthermore, we present a consistency-based unsupervised
regularization term for projection-based 360-degree dense-prediction models to
reduce artefacts in the predictions that occur after inverse projection. Our
approach is the first to employ tangent images for omnidirectional saliency
prediction, and our experimental results on three ODV saliency datasets
demonstrate its effectiveness compared to the state-of-the-art.Comment: 12 pages, 4 figures, accepted to BMVC 202
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video
We introduce a convolutional neural network model for unsupervised learning
of depth and ego-motion from cylindrical panoramic video. Panoramic depth
estimation is an important technology for applications such as virtual reality,
3D modeling, and autonomous robotic navigation. In contrast to previous
approaches for applying convolutional neural networks to panoramic imagery, we
use the cylindrical panoramic projection which allows for the use of the
traditional CNN layers such as convolutional filters and max pooling without
modification. Our evaluation of synthetic and real data shows that unsupervised
learning of depth and ego-motion on cylindrical panoramic images can produce
high-quality depth maps and that an increased field-of-view improves ego-motion
estimation accuracy. We also introduce Headcam, a novel dataset of panoramic
video collected from a helmet-mounted camera while biking in an urban setting.Comment: Accepted to IEEE AIVR 201
- …