1,572 research outputs found

    Complexity Analysis Of Next-Generation VVC Encoding and Decoding

    Full text link
    While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202

    Selected topics in video coding and computer vision

    Get PDF
    Video applications ranging from multimedia communication to computer vision have been extensively studied in the past decades. However, the emergence of new applications continues to raise questions that are only partially answered by existing techniques. This thesis studies three selected topics related to video: intra prediction in block-based video coding, pedestrian detection and tracking in infrared imagery, and multi-view video alignment.;In the state-of-art video coding standard H.264/AVC, intra prediction is defined on the hierarchical quad-tree based block partitioning structure which fails to exploit the geometric constraint of edges. We propose a geometry-adaptive block partitioning structure and a new intra prediction algorithm named geometry-adaptive intra prediction (GAIP). A new texture prediction algorithm named geometry-adaptive intra displacement prediction (GAIDP) is also developed by extending the original intra displacement prediction (IDP) algorithm with the geometry-adaptive block partitions. Simulations on various test sequences demonstrate that intra coding performance of H.264/AVC can be significantly improved by incorporating the proposed geometry adaptive algorithms.;In recent years, due to the decreasing cost of thermal sensors, pedestrian detection and tracking in infrared imagery has become a topic of interest for night vision and all weather surveillance applications. We propose a novel approach for detecting and tracking pedestrians in infrared imagery based on a layered representation of infrared images. Pedestrians are detected from the foreground layer by a Principle Component Analysis (PCA) based scheme using the appearance cue. To facilitate the task of pedestrian tracking, we formulate the problem of shot segmentation and present a graph matching-based tracking algorithm. Simulations with both OSU Infrared Image Database and WVU Infrared Video Database are reported to demonstrate the accuracy and robustness of our algorithms.;Multi-view video alignment is a process to facilitate the fusion of non-synchronized multi-view video sequences for various applications including automatic video based surveillance and video metrology. In this thesis, we propose an accurate multi-view video alignment algorithm that iteratively aligns two sequences in space and time. To achieve an accurate sub-frame temporal alignment, we generalize the existing phase-correlation algorithm to 3-D case. We also present a novel method to obtain the ground-truth of the temporal alignment by using supplementary audio signals sampled at a much higher rate. The accuracy of our algorithm is verified by simulations using real-world sequences

    Depth sequence coding with hierarchical partitioning and spatial-domain quantization

    Get PDF
    Depth coding in 3D-HEVC deforms object shapes due to block-level edge-approximation and lacks efficient techniques to exploit the statistical redundancy, due to the frame-level clustering tendency in depth data, for higher coding gain at near-lossless quality. This paper presents a standalone mono-view depth sequence coder, which preserves edges implicitly by limiting quantization to the spatial-domain and exploits the frame-level clustering tendency efficiently with a novel binary tree-based decomposition (BTBD) technique. The BTBD can exploit the statistical redundancy in frame-level syntax, motion components, and residuals efficiently with fewer block-level prediction/coding modes and simpler context modeling for context-adaptive arithmetic coding. Compared with the depth coder in 3D-HEVC, the proposed one has achieved significantly lower bitrate at lossless to near-lossless quality range for mono-view coding and rendered superior quality synthetic views from the depth maps, compressed at the same bitrate, and the corresponding texture frames. © 1991-2012 IEEE

    General Multi-channel Input Video Support for Video Encoding

    Get PDF
    Traditional video coding standards process video frames in the YCbCr color format, with a single luminance channel (Y) and two chroma channels (Cb, Cr). There is a strong correlation between the three channels that can enable efficient encoding and good compression, since decisions such as motion estimation and block partitioning can be made once for the Y plane and applied to the chroma channels. Also, subsampling can be used to reduce the resolution of the chroma planes. However, these standards are unable to efficiently encode additional data such as depth, alpha, velocity, etc. that is needed for immersive applications such as virtual reality (VR), augmented reality (AR), or extended reality (XR) to provide a high quality photorealistic experience. This disclosure describes configuring the additional data as a single input that can optionally be combined with the YCbCr input. Regardless of the number of channels, the implementation of the encoder can be designed such that the complexity of the encoder is a function of one dominant plane/channel. Current video coding standards can be extended, and future video coding standards can be implemented to incorporate such input by including a provision to signal the number of input channels and their resolution. A dominant plane can also be explicitly signaled, or it can be implied to be the first plane

    Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders

    Full text link
    The next-generation Versatile Video Coding (VVC) standard introduces a new Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree (BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions. This new approach leads to five possible splits at each block depth and thereby improves the coding efficiency of VVC over that of the preceding High Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT) partitioning with a single split per block depth. However, MTT also has brought a considerable impact on encoder computational complexity. In this paper, a two-stage learning-based technique is proposed to tackle the complexity overhead of MTT in VVC intra encoders. In our scheme, the input block is first processed by a Convolutional Neural Network (CNN) to predict its spatial features through a vector of probabilities describing the partition at each 4x4 edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial features to predict the most likely splits at each block. Finally, based on this prediction, only the N most likely splits are processed by the Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT models on a wide range of image contents, we also propose a public VVC frame partitioning dataset based on existing image dataset encoded with the VVC reference software encoder. Our proposal relying on the top-3 configuration reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%. A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57% bitrate loss. These results emphasis a better trade-off between VTM intra coding efficiency and complexity reduction compared to the state-of-the-art solutions

    Modified Three-Step Search Block Matching Motion Estimation and Weighted Finite Automata based Fractal Video Compression

    Get PDF
    The major challenge with fractal image/video coding technique is that, it requires more encoding time. Therefore, how to reduce the encoding time is the research component remains in the fractal coding. Block matching motion estimation algorithms are used, to reduce the computations performed in the process of encoding. The objective of the proposed work is to develop an approach for video coding using modified three step search (MTSS) block matching algorithm and weighted finite automata (WFA) coding with a specific focus on reducing the encoding time. The MTSS block matching algorithm are used for computing motion vectors between the two frames i.e. displacement of pixels and WFA is used for the coding as it behaves like the Fractal Coding (FC). WFA represents an image (frame or motion compensated prediction error) based on the idea of fractal that the image has self-similarity in itself. The self-similarity is sought from the symmetry of an image, so the encoding algorithm divides an image into multi-levels of quad-tree segmentations and creates an automaton from the sub-images. The proposed MTSS block matching algorithm is based on the combination of rectangular and hexagonal search pattern and compared with the existing New Three-Step Search (NTSS), Three-Step Search (TSS), and Efficient Three-Step Search (ETSS) block matching estimation algorithm. The performance of the proposed MTSS block matching algorithm is evaluated on the basis of performance evaluation parameters i.e. mean absolute difference (MAD) and average search points required per frame. Mean of absolute difference (MAD) distortion function is used as the block distortion measure (BDM). Finally, developed approaches namely, MTSS and WFA, MTSS and FC, and Plane FC (applied on every frame) are compared with each other. The experimentations are carried out on the standard uncompressed video databases, namely, akiyo, bus, mobile, suzie, traffic, football, soccer, ice etc. Developed approaches are compared on the basis of performance evaluation parameters, namely, encoding time, decoding time, compression ratio and Peak Signal to Noise Ratio (PSNR). The video compression using MTSS and WFA coding performs better than MTSS and fractal coding, and frame by frame fractal coding in terms of achieving reduced encoding time and better quality of video

    A Review on Block Matching Motion Estimation and Automata Theory based Approaches for Fractal Coding

    Get PDF
    Fractal compression is the lossy compression technique in the field of gray/color image and video compression. It gives high compression ratio, better image quality with fast decoding time but improvement in encoding time is a challenge. This review paper/article presents the analysis of most significant existing approaches in the field of fractal based gray/color images and video compression, different block matching motion estimation approaches for finding out the motion vectors in a frame based on inter-frame coding and intra-frame coding i.e. individual frame coding and automata theory based coding approaches to represent an image/sequence of images. Though different review papers exist related to fractal coding, this paper is different in many sense. One can develop the new shape pattern for motion estimation and modify the existing block matching motion estimation with automata coding to explore the fractal compression technique with specific focus on reducing the encoding time and achieving better image/video reconstruction quality. This paper is useful for the beginners in the domain of video compression
    • …
    corecore