228 research outputs found

    Real-time complexity constrained encoding

    Get PDF
    Complex software appliances can be deployed on hardware with limited available computational resources. This computational boundary puts an additional constraint on software applications. This can be an issue for real-time applications with a fixed time constraint such as low delay video encoding. In the context of High Efficiency Video Coding (HEVC), a limited number of publications have focused on controlling the complexity of an HEVC video encoder. In this paper, a technique is proposed to control complexity by deciding between 2Nx2N merge mode and full encoding, at different Coding Unit (CU) depths. The technique is demonstrated in two encoders. The results demonstrate fast convergence to a given complexity threshold, and a limited loss in rate-distortion performance (on average 2.84% Bjontegaard delta rate for 40% complexity reduction)

    Complexity Analysis Of Next-Generation VVC Encoding and Decoding

    Full text link
    While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Efficient Coding Tree Unit (CTU) Decision Method for Scalable High-Efficiency Video Coding (SHVC) Encoder

    Get PDF
    High-efficiency video coding (HEVC or H.265) is the latest video compression standard developed by the joint collaborative team on video coding (JCT-VC), finalized in 2013. HEVC can achieve an average bit rate decrease of 50% in comparison with H.264/AVC while still maintaining video quality. To upgrade the HEVC used in heterogeneous access networks, the JVT-VC has been approved scalable extension of HEVC (SHVC) in July 2014. The SHVC can achieve the highest coding efficiency but requires a very high computational complexity such that its real-time application is limited. To reduce the encoding complexity of SHVC, in this chapter, we employ the temporal-spatial and inter-layer correlations between base layer (BL) and enhancement layer (EL) to predict the best quadtree of coding tree unit (CTU) for quality SHVC. Due to exist a high correlation between layers, we utilize the coded information from the CTU quadtree in BL, including inter-layer intra/residual prediction and inter-layer motion parameter prediction, to predict the CTU quadtree in EL. Therefore, we develop an efficient CTU decision method by combing temporal-spatial searching order algorithm (TSSOA) in BL and a fast inter-layer searching algorithm (FILSA) in EL to speed up the encoding process of SHVC. The simulation results show that the proposed efficient CTU decision method can achieve an average time improving ratio (TIR) about 52–78% and 47–69% for low delay (LD) and random access (RA) configurations, respectively. It is clear that the proposed method can efficiently reduce the computational complexity of SHVC encoder with negligible loss of coding efficiency with various types of video sequences

    Steered mixture-of-experts for light field images and video : representation and coding

    Get PDF
    Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution

    Towards Hybrid-Optimization Video Coding

    Full text link
    Video coding is a mathematical optimization problem of rate and distortion essentially. To solve this complex optimization problem, two popular video coding frameworks have been developed: block-based hybrid video coding and end-to-end learned video coding. If we rethink video coding from the perspective of optimization, we find that the existing two frameworks represent two directions of optimization solutions. Block-based hybrid coding represents the discrete optimization solution because those irrelevant coding modes are discrete in mathematics. It searches for the best one among multiple starting points (i.e. modes). However, the search is not efficient enough. On the other hand, end-to-end learned coding represents the continuous optimization solution because the gradient descent is based on a continuous function. It optimizes a group of model parameters efficiently by the numerical algorithm. However, limited by only one starting point, it is easy to fall into the local optimum. To better solve the optimization problem, we propose to regard video coding as a hybrid of the discrete and continuous optimization problem, and use both search and numerical algorithm to solve it. Our idea is to provide multiple discrete starting points in the global space and optimize the local optimum around each point by numerical algorithm efficiently. Finally, we search for the global optimum among those local optimums. Guided by the hybrid optimization idea, we design a hybrid optimization video coding framework, which is built on continuous deep networks entirely and also contains some discrete modes. We conduct a comprehensive set of experiments. Compared to the continuous optimization framework, our method outperforms pure learned video coding methods. Meanwhile, compared to the discrete optimization framework, our method achieves comparable performance to HEVC reference software HM16.10 in PSNR

    Light field coding with field of view scalability and exemplar-based inter-layer prediction

    Get PDF
    Light field imaging based on microlens arrays—a.k.a. holoscopic, plenoptic, and integral imaging—has currently risen up as a feasible and prospective technology for future image and video applications. However, deploying actual light field applications will require identifying more powerful representations and coding solutions that support arising new manipulation and interaction functionalities. In this context, this paper proposes a novel scalable coding solution that supports a new type of scalability, referred to as field-of-view scalability. The proposed scalable coding solution comprises a base layer compliant with the High Efficiency Video Coding (HEVC) standard, complemented by one or more enhancement layers that progressively allow richer versions of the same light field content in terms of content manipulation and interaction possibilities. In addition, to achieve high-compression performance in the enhancement layers, novel exemplar-based interlayer coding tools are also proposed, namely: 1) a direct prediction based on exemplar texture samples from lower layers and 2) an interlayer compensated prediction using a reference picture that is built relying on an exemplar-based algorithm for texture synthesis. Experimental results demonstrate the advantages of the proposed scalable coding solution to cater to users with different preferences/requirements in terms of interaction functionalities, while providing better rate- distortion performance (independently of the optical setup used for acquisition) compared to HEVC and other scalable light field coding solutions in the literature.info:eu-repo/semantics/acceptedVersio
    corecore