71 research outputs found

    Optimization of Occlusion-Inducing Depth Pixels in 3-D Video Coding

    Full text link
    The optimization of occlusion-inducing depth pixels in depth map coding has received little attention in the literature, since their associated texture pixels are occluded in the synthesized view and their effect on the synthesized view is considered negligible. However, the occlusion-inducing depth pixels still need to consume the bits to be transmitted, and will induce geometry distortion that inherently exists in the synthesized view. In this paper, we propose an efficient depth map coding scheme specifically for the occlusion-inducing depth pixels by using allowable depth distortions. Firstly, we formulate a problem of minimizing the overall geometry distortion in the occlusion subject to the bit rate constraint, for which the depth distortion is properly adjusted within the set of allowable depth distortions that introduce the same disparity error as the initial depth distortion. Then, we propose a dynamic programming solution to find the optimal depth distortion vector for the occlusion. The proposed algorithm can improve the coding efficiency without alteration of the occlusion order. Simulation results confirm the performance improvement compared to other existing algorithms

    Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

    Full text link
    Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations.Comment: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    Resolutıon Enhancement Based Image Compression Technique using Singular Value Decomposition and Wavelet Transforms

    Get PDF
    In this chapter, we propose a new lossy image compression technique that uses singular value decomposition (SVD) and wavelet difference reduction (WDR) technique followed by resolution enhancement using discrete wavelet transform (DWT) and stationary wavelet transform (SWT). The input image is decomposed into four different frequency subbands by using DWT. The low-frequency subband is the being compressed by using DWR and in parallel the high-frequency subbands are being compressed by using SVD which reduces the rank by ignoring small singular values. The compression ratio is obtained by dividing the total number of bits required to represent the input image over the total bit numbers obtain by WDR and SVD. Reconstruction is carried out by using inverse of WDR to obtained low-frequency subband and reconstructing the high-frequency subbands by using matrix multiplications. The high-frequency subbands are being enhanced by incorporating the high-frequency subbands obtained by applying SWT on the reconstructed low-frequency subband. The reconstructed low-frequency subband and enhanced high-frequency subbands are being used to generate the reconstructed image by using inverse DWT. The visual and quantitative experimental results of the proposed image compression technique are shown and also compared with those of the WDR with arithmetic coding technique and JPEG2000. From the results of the comparison, the proposed image compression technique outperforms the WDR-AC and JPEG2000 techniques

    Quality-aware adaptive delivery of multi-view video

    Get PDF
    Advances in video coding and networking technologies have paved the way for the Multi-View Video (MVV) streaming. However, large amounts of data and dynamic network conditions result in frequent network congestion, which may prevent video packets from being delivered on time. As a consequence, the 3D viewing experience may be degraded signifi- cantly, unless quality-aware adaptation methods are deployed. There is no research work to discuss the MVV adaptation of decision strategy or provide a detailed analysis of a dynamic network environment. This work addresses the mentioned issues for MVV streaming over HTTP for emerging multi-view displays. In this research work, the effect of various adaptations of decision strategies are evaluated and, as a result, a new quality-aware adaptation method is designed. The proposed method is benefiting from layer based video coding in such a way that high Quality of Experience (QoE) is maintained in a cost-effective manner. The conducted experimental results on MVV streaming using the proposed strategy are showing that the perceptual 3D video quality, under adverse network conditions, is enhanced significantly as a result of the proposed quality-aware adaptation
    corecore