247 research outputs found
OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution
360{\deg} omnidirectional images have gained research attention due to their
immersive and interactive experience, particularly in AR/VR applications.
However, they suffer from lower angular resolution due to being captured by
fisheye lenses with the same sensor size for capturing planar images. To solve
the above issues, we propose a two-stage framework for 360{\deg}
omnidirectional image superresolution. The first stage employs two branches:
model A, which incorporates omnidirectional position-aware deformable blocks
(OPDB) and Fourier upsampling, and model B, which adds a spatial frequency
fusion module (SFF) to model A. Model A aims to enhance the feature extraction
ability of 360{\deg} image positional information, while Model B further
focuses on the high-frequency information of 360{\deg} images. The second stage
performs same-resolution enhancement based on the structure of model A with a
pixel unshuffle operation. In addition, we collected data from YouTube to
improve the fitting ability of the transformer, and created pseudo
low-resolution images using a degradation network. Our proposed method achieves
superior performance and wins the NTIRE 2023 challenge of 360{\deg}
omnidirectional image super-resolution.Comment: Accepted to CVPRW 202
OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution
Omnidirectional images (ODIs) have become increasingly popular, as their
large field-of-view (FoV) can offer viewers the chance to freely choose the
view directions in immersive environments such as virtual reality. The M\"obius
transformation is typically employed to further provide the opportunity for
movement and zoom on ODIs, but applying it to the image level often results in
blurry effect and aliasing problem. In this paper, we propose a novel deep
learning-based approach, called \textbf{OmniZoomer}, to incorporate the
M\"obius transformation into the network for movement and zoom on ODIs. By
learning various transformed feature maps under different conditions, the
network is enhanced to handle the increasing edge curvatures, which alleviates
the blurry effect. Moreover, to address the aliasing problem, we propose two
key components. Firstly, to compensate for the lack of pixels for describing
curves, we enhance the feature maps in the high-resolution (HR) space and
calculate the transformed index map with a spatial index generation module.
Secondly, considering that ODIs are inherently represented in the spherical
space, we propose a spherical resampling module that combines the index map and
HR feature maps to transform the feature maps for better spherical correlation.
The transformed feature maps are decoded to output a zoomed ODI. Experiments
show that our method can produce HR and high-quality ODIs with the flexibility
to move and zoom in to the object of interest. Project page is available at
http://vlislab22.github.io/OmniZoomer/.Comment: Accepted by ICCV 202
Spherical Vision Transformer for 360-degree Video Saliency Prediction
The growing interest in omnidirectional videos (ODVs) that capture the full
field-of-view (FOV) has gained 360-degree saliency prediction importance in
computer vision. However, predicting where humans look in 360-degree scenes
presents unique challenges, including spherical distortion, high resolution,
and limited labelled data. We propose a novel vision-transformer-based model
for omnidirectional videos named SalViT360 that leverages tangent image
representations. We introduce a spherical geometry-aware spatiotemporal
self-attention mechanism that is capable of effective omnidirectional video
understanding. Furthermore, we present a consistency-based unsupervised
regularization term for projection-based 360-degree dense-prediction models to
reduce artefacts in the predictions that occur after inverse projection. Our
approach is the first to employ tangent images for omnidirectional saliency
prediction, and our experimental results on three ODV saliency datasets
demonstrate its effectiveness compared to the state-of-the-art.Comment: 12 pages, 4 figures, accepted to BMVC 202
Saliency in Augmented Reality
With the rapid development of multimedia technology, Augmented Reality (AR)
has become a promising next-generation mobile platform. The primary theory
underlying AR is human visual confusion, which allows users to perceive the
real-world scenes and augmented contents (virtual-world scenes) simultaneously
by superimposing them together. To achieve good Quality of Experience (QoE), it
is important to understand the interaction between two scenarios, and
harmoniously display AR contents. However, studies on how this superimposition
will influence the human visual attention are lacking. Therefore, in this
paper, we mainly analyze the interaction effect between background (BG) scenes
and AR contents, and study the saliency prediction problem in AR. Specifically,
we first construct a Saliency in AR Dataset (SARD), which contains 450 BG
images, 450 AR images, as well as 1350 superimposed images generated by
superimposing BG and AR images in pair with three mixing levels. A large-scale
eye-tracking experiment among 60 subjects is conducted to collect eye movement
data. To better predict the saliency in AR, we propose a vector quantized
saliency prediction method and generalize it for AR saliency prediction. For
comparison, three benchmark methods are proposed and evaluated together with
our proposed method on our SARD. Experimental results demonstrate the
superiority of our proposed method on both of the common saliency prediction
problem and the AR saliency prediction problem over benchmark methods. Our data
collection methodology, dataset, benchmark methods, and proposed saliency
models will be publicly available to facilitate future research
IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model
Generating complete 360-degree panoramas from narrow field of view images is
ongoing research as omnidirectional RGB data is not readily available. Existing
GAN-based approaches face some barriers to achieving higher quality output, and
have poor generalization performance over different mask types. In this paper,
we present our 360-degree indoor RGB panorama outpainting model using latent
diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent
diffusion structure that utilizes both RGB and depth panoramic data during
training, but works surprisingly well to outpaint normal depth-free RGB images
during inference. We further propose a novel technique of introducing
progressive camera rotations during each diffusion denoising step, which leads
to substantial improvement in achieving panorama wraparound consistency.
Results show that our IPO-LDM not only significantly outperforms
state-of-the-art methods on RGB panorama outpainting, but can also produce
multiple and diverse well-structured results for different types of masks
- …