Search CORE

74 research outputs found

Spherical Vision Transformer for 360-degree Video Saliency Prediction

Author: Cokelek Mert
Erdem Aykut
Erdem Erkut
Imamoglu Nevrez
Ozcinar Cagri
Publication venue
Publication date: 24/08/2023
Field of study

The growing interest in omnidirectional videos (ODVs) that capture the full field-of-view (FOV) has gained 360-degree saliency prediction importance in computer vision. However, predicting where humans look in 360-degree scenes presents unique challenges, including spherical distortion, high resolution, and limited labelled data. We propose a novel vision-transformer-based model for omnidirectional videos named SalViT360 that leverages tangent image representations. We introduce a spherical geometry-aware spatiotemporal self-attention mechanism that is capable of effective omnidirectional video understanding. Furthermore, we present a consistency-based unsupervised regularization term for projection-based 360-degree dense-prediction models to reduce artefacts in the predictions that occur after inverse projection. Our approach is the first to employ tangent images for omnidirectional saliency prediction, and our experimental results on three ODV saliency datasets demonstrate its effectiveness compared to the state-of-the-art.Comment: 12 pages, 4 figures, accepted to BMVC 202

arXiv.org e-Print Archive

Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video

Author: Sharma Alisha
Ventura Jonathan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/10/2019
Field of study

We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3D modeling, and autonomous robotic navigation. In contrast to previous approaches for applying convolutional neural networks to panoramic imagery, we use the cylindrical panoramic projection which allows for the use of the traditional CNN layers such as convolutional filters and max pooling without modification. Our evaluation of synthetic and real data shows that unsupervised learning of depth and ego-motion on cylindrical panoramic images can produce high-quality depth maps and that an increased field-of-view improves ego-motion estimation accuracy. We also introduce Headcam, a novel dataset of panoramic video collected from a helmet-mounted camera while biking in an urban setting.Comment: Accepted to IEEE AIVR 201

arXiv.org e-Print Archive

Crossref