4 research outputs found
A CNN-based Post-Processor for Perceptually-Optimized Immersive Media Compression
In recent years, resolution adaptation based on deep neural networks has
enabled significant performance gains for conventional (2D) video codecs. This
paper investigates the effectiveness of spatial resolution resampling in the
context of immersive content. The proposed approach reduces the spatial
resolution of input multi-view videos before encoding, and reconstructs their
original resolution after decoding. During the up-sampling process, an advanced
CNN model is used to reduce potential re-sampling, compression, and synthesis
artifacts. This work has been fully tested with the TMIV coding standard using
a Versatile Video Coding (VVC) codec. The results demonstrate that the proposed
method achieves a significant rate-quality performance improvement for the
majority of the test sequences, with an average BD-VMAF improvement of 3.07
overall sequences
Spherical-harmonics-based sound field decomposition and multichannel NMF for sound source separation
3D Scene Geometry Estimation from 360 Imagery: A Survey
This paper provides a comprehensive survey on pioneer and state-of-the-art 3D
scene geometry estimation methodologies based on single, two, or multiple
images captured under the omnidirectional optics. We first revisit the basic
concepts of the spherical camera model, and review the most common acquisition
technologies and representation formats suitable for omnidirectional (also
called 360, spherical or panoramic) images and videos. We then survey
monocular layout and depth inference approaches, highlighting the recent
advances in learning-based solutions suited for spherical data. The classical
stereo matching is then revised on the spherical domain, where methodologies
for detecting and describing sparse and dense features become crucial. The
stereo matching concepts are then extrapolated for multiple view camera setups,
categorizing them among light fields, multi-view stereo, and structure from
motion (or visual simultaneous localization and mapping). We also compile and
discuss commonly adopted datasets and figures of merit indicated for each
purpose and list recent results for completeness. We conclude this paper by
pointing out current and future trends.Comment: Published in ACM Computing Survey