71 research outputs found
Optimization of Occlusion-Inducing Depth Pixels in 3-D Video Coding
The optimization of occlusion-inducing depth pixels in depth map coding has
received little attention in the literature, since their associated texture
pixels are occluded in the synthesized view and their effect on the synthesized
view is considered negligible. However, the occlusion-inducing depth pixels
still need to consume the bits to be transmitted, and will induce geometry
distortion that inherently exists in the synthesized view. In this paper, we
propose an efficient depth map coding scheme specifically for the
occlusion-inducing depth pixels by using allowable depth distortions. Firstly,
we formulate a problem of minimizing the overall geometry distortion in the
occlusion subject to the bit rate constraint, for which the depth distortion is
properly adjusted within the set of allowable depth distortions that introduce
the same disparity error as the initial depth distortion. Then, we propose a
dynamic programming solution to find the optimal depth distortion vector for
the occlusion. The proposed algorithm can improve the coding efficiency without
alteration of the occlusion order. Simulation results confirm the performance
improvement compared to other existing algorithms
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
Ambisonics i.e., a full-sphere surround sound, is quintessential with
360-degree visual content to provide a realistic virtual reality (VR)
experience. While 360-degree visual content capture gained a tremendous boost
recently, the estimation of corresponding spatial sound is still challenging
due to the required sound-field microphones or information about the
sound-source locations. In this paper, we introduce a novel problem of
generating Ambisonics in 360-degree videos using the audio-visual cue. With
this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos
is introduced with annotated sound-source locations. Secondly, a pipeline is
designed for an automatic Ambisonic estimation problem. Benefiting from the
deep learning-based audio-visual feature-embedding and prediction modules, our
pipeline estimates the 3D sound-source locations and further use such locations
to encode to the B-format. To benchmark our dataset and pipeline, we
additionally propose evaluation criteria to investigate the performance using
different 360-degree input representations. Our results demonstrate the
efficacy of the proposed pipeline and open up a new area of research in
360-degree audio-visual analysis for future investigations.Comment: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP
Resolutıon Enhancement Based Image Compression Technique using Singular Value Decomposition and Wavelet Transforms
In this chapter, we propose a new lossy image compression technique that uses singular value decomposition (SVD) and wavelet difference reduction (WDR) technique followed by resolution enhancement using discrete wavelet transform (DWT) and stationary wavelet transform (SWT). The input image is decomposed into four different frequency subbands by using DWT. The low-frequency subband is the being compressed by using DWR and in parallel the high-frequency subbands are being compressed by using SVD which reduces the rank by ignoring small singular values. The compression ratio is obtained by dividing the total number of bits required to represent the input image over the total bit numbers obtain by WDR and SVD. Reconstruction is carried out by using inverse of WDR to obtained low-frequency subband and reconstructing the high-frequency subbands by using matrix multiplications. The high-frequency subbands are being enhanced by incorporating the high-frequency subbands obtained by applying SWT on the reconstructed low-frequency subband. The reconstructed low-frequency subband and enhanced high-frequency subbands are being used to generate the reconstructed image by using inverse DWT. The visual and quantitative experimental results of the proposed image compression technique are shown and also compared with those of the WDR with arithmetic coding technique and JPEG2000. From the results of the comparison, the proposed image compression technique outperforms the WDR-AC and JPEG2000 techniques
Quality-aware adaptive delivery of multi-view video
Advances in video coding and networking technologies have
paved the way for the Multi-View Video (MVV) streaming.
However, large amounts of data and dynamic network conditions
result in frequent network congestion, which may prevent
video packets from being delivered on time. As a consequence,
the 3D viewing experience may be degraded signifi-
cantly, unless quality-aware adaptation methods are deployed.
There is no research work to discuss the MVV adaptation of
decision strategy or provide a detailed analysis of a dynamic
network environment. This work addresses the mentioned issues
for MVV streaming over HTTP for emerging multi-view
displays. In this research work, the effect of various adaptations
of decision strategies are evaluated and, as a result, a
new quality-aware adaptation method is designed. The proposed
method is benefiting from layer based video coding in
such a way that high Quality of Experience (QoE) is maintained
in a cost-effective manner. The conducted experimental
results on MVV streaming using the proposed strategy are
showing that the perceptual 3D video quality, under adverse
network conditions, is enhanced significantly as a result of the
proposed quality-aware adaptation
- …