2,638 research outputs found

    Intra-WZ quantization mismatch in distributed video coding

    Get PDF
    During the past decade, Distributed Video Coding (DVC) has emerged as a new video coding paradigm, shifting the complexity from the encoder-to the decoder-side. This paper addresses a problem of current DVC architectures that has not been studied in the literature so far, that is, the mismatch between the intra and Wyner-Ziv (WZ) quantization processes. Due to this mismatch, WZ rate is spent even for spatial regions that are accurately approximated by the side-information. As a solution, this paper proposes side-information generation using selective unidirectional motion compensation from temporally adjacent WZ frames. Experimental results show that the proposed approach yields promising WZ rate gains of up to 7% relative to the conventional method

    Overview of MV-HEVC prediction structures for light field video

    Get PDF
    Light field video is a promising technology for delivering the required six-degrees-of-freedom for natural content in virtual reality. Already existing multi-view coding (MVC) and multi-view plus depth (MVD) formats, such as MV-HEVC and 3D-HEVC, are the most conventional light field video coding solutions since they can compress video sequences captured simultaneously from multiple camera angles. 3D-HEVC treats a single view as a video sequence and the other sub-aperture views as gray-scale disparity (depth) maps. On the other hand, MV-HEVC treats each view as a separate video sequence, which allows the use of motion compensated algorithms similar to HEVC. While MV-HEVC and 3D-HEVC provide similar results, MV-HEVC does not require any disparity maps to be readily available, and it has a more straightforward implementation since it only uses syntax elements rather than additional prediction tools for inter-view prediction. However, there are many degrees of freedom in choosing an appropriate structure and it is currently still unknown which one is optimal for a given set of application requirements. In this work, various prediction structures for MV-HEVC are implemented and tested. The findings reveal the trade-off between compression gains, distortion and random access capabilities in MVHEVC light field video coding. The results give an overview of the most optimal solutions developed in the context of this work, and prediction structure algorithms proposed in state-of-the-art literature. This overview provides a useful benchmark for future development of light field video coding solutions

    RLFC: Random Access Light Field Compression using Key Views and Bounded Integer Encoding

    Full text link
    We present a new hierarchical compression scheme for encoding light field images (LFI) that is suitable for interactive rendering. Our method (RLFC) exploits redundancies in the light field images by constructing a tree structure. The top level (root) of the tree captures the common high-level details across the LFI, and other levels (children) of the tree capture specific low-level details of the LFI. Our decompressing algorithm corresponds to tree traversal operations and gathers the values stored at different levels of the tree. Furthermore, we use bounded integer sequence encoding which provides random access and fast hardware decoding for compressing the blocks of children of the tree. We have evaluated our method for 4D two-plane parameterized light fields. The compression rates vary from 0.08 - 2.5 bits per pixel (bpp), resulting in compression ratios of around 200:1 to 20:1 for a PSNR quality of 40 to 50 dB. The decompression times for decoding the blocks of LFI are 1 - 3 microseconds per channel on an NVIDIA GTX-960 and we can render new views with a resolution of 512X512 at 200 fps. Our overall scheme is simple to implement and involves only bit manipulations and integer arithmetic operations.Comment: Accepted for publication at Symposium on Interactive 3D Graphics and Games (I3D '19

    Motion Vector Estimation Search using Hexagon-Diamond Pattern for Video Sequences, Grid Point and Block-Based

    Get PDF
    Grid and block-based motion vector estimation techniques are proposed for motion tracking in video sequences. The grid technique is referred to the hexagon-diamond pattern. While, block-based technique is referred to 16 × 16 pixels of blocks in a single frame in video sequences. The hexagon and diamond pattern is applied onto the 16 × 16 pixels blocks in a single frame for motion tracking purposes in video sequences. The hexagon grid pattern will conduct a search to capture the motion in a particular block of the hexagon region before the diamond grid pattern takes place for the fine search. The diamond grid pattern provides accuracy to obtain the best grid vector coordinate for motion tracking purposes. The hexagon-diamond grid vector coordinate can be used to determine whether the object is moving toward the horizontal or vertical plane. The information determined at grid vector coordinate can be used as a reference when referring to the previous frame in video sequence processing. The grid vector coordinate will help to determine the area of interest to be examined based on the coordinate obtained. Besides the grid vector estimation, the Point Signal Noise-to-Ratio (PSNR) is also applied to measure the quality of the video

    Motion Estimation and Compensation in the Redundant Wavelet Domain

    Get PDF
    Despite being the prefered approach for still-image compression for nearly a decade, wavelet-based coding for video has been slow to emerge, due primarily to the fact that the shift variance of the discrete wavelet transform hinders motion estimation and compensation crucial to modern video coders. Recently it has been recognized that a redundant, or overcomplete, wavelet transform is shift invariant and thus permits motion prediction in the wavelet domain. In this dissertation, other uses for the redundancy of overcomplete wavelet transforms in video coding are explored. First, it is demonstrated that the redundant-wavelet domain facilitates the placement of an irregular triangular mesh to video images, thereby exploiting transform redundancy to implement geometries for motion estimation and compensation more general than the traditional block structure widely employed. As the second contribution of this dissertation, a new form of multihypothesis prediction, redundant wavelet multihypothesis, is presented. This new approach to motion estimation and compensation produces motion predictions that are diverse in transform phase to increase prediction accuracy. Finally, it is demonstrated that the proposed redundant-wavelet strategies complement existing advanced video-coding techniques and produce significant performance improvements in a battery of experimental results

    Statistical framework for video decoding complexity modeling and prediction

    Get PDF
    Video decoding complexity modeling and prediction is an increasingly important issue for efficient resource utilization in a variety of applications, including task scheduling, receiver-driven complexity shaping, and adaptive dynamic voltage scaling. In this paper we present a novel view of this problem based on a statistical framework perspective. We explore the statistical structure (clustering) of the execution time required by each video decoder module (entropy decoding, motion compensation, etc.) in conjunction with complexity features that are easily extractable at encoding time (representing the properties of each module's input source data). For this purpose, we employ Gaussian mixture models (GMMs) and an expectation-maximization algorithm to estimate the joint execution-time - feature probability density function (PDF). A training set of typical video sequences is used for this purpose in an offline estimation process. The obtained GMM representation is used in conjunction with the complexity features of new video sequences to predict the execution time required for the decoding of these sequences. Several prediction approaches are discussed and compared. The potential mismatch between the training set and new video content is addressed by adaptive online joint-PDF re-estimation. An experimental comparison is performed to evaluate the different approaches and compare the proposed prediction scheme with related resource prediction schemes from the literature. The usefulness of the proposed complexity-prediction approaches is demonstrated in an application of rate-distortion-complexity optimized decoding

    Mesh-based video coding for low bit-rate communications

    Get PDF
    In this paper, a new method for low bit-rate content-adaptive mesh-based video coding is proposed. Intra-frame coding of this method employs feature map extraction for node distribution at specific threshold levels to achieve higher density placement of initial nodes for regions that contain high frequency features and conversely sparse placement of initial nodes for smooth regions. Insignificant nodes are largely removed using a subsequent node elimination scheme. The Hilbert scan is then applied before quantization and entropy coding to reduce amount of transmitted information. For moving images, both node position and color parameters of only a subset of nodes may change from frame to frame. It is sufficient to transmit only these changed parameters. The proposed method is well-suited for video coding at very low bit rates, as processing results demonstrate that it provides good subjective and objective image quality at a lower number of required bits

    Super Resolution of Wavelet-Encoded Images and Videos

    Get PDF
    In this dissertation, we address the multiframe super resolution reconstruction problem for wavelet-encoded images and videos. The goal of multiframe super resolution is to obtain one or more high resolution images by fusing a sequence of degraded or aliased low resolution images of the same scene. Since the low resolution images may be unaligned, a registration step is required before super resolution reconstruction. Therefore, we first explore in-band (i.e. in the wavelet-domain) image registration; then, investigate super resolution. Our motivation for analyzing the image registration and super resolution problems in the wavelet domain is the growing trend in wavelet-encoded imaging, and wavelet-encoding for image/video compression. Due to drawbacks of widely used discrete cosine transform in image and video compression, a considerable amount of literature is devoted to wavelet-based methods. However, since wavelets are shift-variant, existing methods cannot utilize wavelet subbands efficiently. In order to overcome this drawback, we establish and explore the direct relationship between the subbands under a translational shift, for image registration and super resolution. We then employ our devised in-band methodology, in a motion compensated video compression framework, to demonstrate the effective usage of wavelet subbands. Super resolution can also be used as a post-processing step in video compression in order to decrease the size of the video files to be compressed, with downsampling added as a pre-processing step. Therefore, we present a video compression scheme that utilizes super resolution to reconstruct the high frequency information lost during downsampling. In addition, super resolution is a crucial post-processing step for satellite imagery, due to the fact that it is hard to update imaging devices after a satellite is launched. Thus, we also demonstrate the usage of our devised methods in enhancing resolution of pansharpened multispectral images
    corecore