10 research outputs found

    Rate-distortion Optimization Using Adaptive Lagrange Multipliers

    Get PDF

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    Rate-Distortion Estimation for H. 264/AVC Coders

    No full text

    HEVC와 JPEG 하드웨어 부호화기를 위한 DCT의 Approximate Calculation

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 이혁재.Discrete Cosine Transform (DCT) is widely used for various image and video compression applications because of its excellent energy compaction property. DCT is computationally intensive and the calculations are parallelizable. Therefore it is often implemented in hardware for speeding up the calculation. However due to large size of DCT or multiple modules of DCT required for some applications, the hardware area taken up by DCT in image or video encoders become significant. The DCT required in most applications doesnt need to be exact. Taking advantage of this fact, here a novel approach is provided to reduce the hardware area cost of the DCT module. The DCT hardware module consists of combinational logic and memory. Both the components are reduced and the complete implementation is described. The application being aimed at is for HEVC and JPEG, however the idea is applicable to any DCT hardware implementation. Finally the degradation caused to encoded image and video in terms of BDBR is discussed and the gate count results from the synthesis is provided.Chapter 1 Introduction 1 1.1 2D DCT Hardware Module . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Pipelining the process . . . . . . . . . . . . . . . . . . . . 5 1.2 Approximate DCT . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 Related Works 9 Chapter 3 The Moving Window Idea for Bit-Width Reduction 12 3.1 ML Recovery for Moving Window . . . . . . . . . . . . . . . . . 16 Chapter 4 Approximate DCT for HEVC 19 4.1 HEVC Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 HEVC Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 DCT in HEVC Encoder . . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Approximate DCT in HEVC . . . . . . . . . . . . . . . . . . . . 23 4.4.1 The three components of the DCT module . . . . . . . . 27 4.4.2 Optimizing Partial Butterfly Adder/Subtractors . . . . . 29 4.4.3 Optimizing the multiplication module . . . . . . . . . . . 30 4.4.3.1 Multiple Constant Multiplication (MCM) . . . . 32 4.4.3.2 Approximate MCM . . . . . . . . . . . . . . . . 32 4.4.4 Optimizing the transpose memory . . . . . . . . . . . . . 36 Chapter 5 Approximate DCT for JPEG 39 5.1 JPEG Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Approximate DCT . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.3 Application of Moving Window to DCT transpose memory . . . 42 5.3.1 Ideal implementation . . . . . . . . . . . . . . . . . . . . . 43 5.3.2 Window position based on first row . . . . . . . . . . . . . 43 5.3.2.1 Cases of failure . . . . . . . . . . . . . . . . . . . 46 5.3.3 Position based on first column . . . . . . . . . . . . . . . 48 5.3.3.1 Cases of failure . . . . . . . . . . . . . . . . . . . 49 5.4 Hybrid implementation . . . . . . . . . . . . . . . . . . . . . . . . 50 Chapter 6 Experimental Results 54 6.1 HEVC Experiments and Results . . . . . . . . . . . . . . . . . . 55 6.2 JPEG Experiments and Results . . . . . . . . . . . . . . . . . . . 55 Chapter 7 Conclusion 64Maste

    Efficient compression of synthetic video

    Get PDF
    Streaming of on-line gaming video is a challenging problem because of the enormous amounts of video data that need to be sent during game playing, especially within the limitations of uplink capabilities. The encoding complexity is also a challenge because of the time delay while on-line gamers are communicating. The main goal of this research study is to propose an enhanced on-line game video streaming system. First, the most common video coding techniques have been evaluated. The evaluation study considers objective and subjective metrics. Three widespread video coding techniques are selected and evaluated in the study; H.264, MPEG-4 Visual and VP- 8. Diverse types of video sequences were used with different frame rates and resolutions. The effects of changing frame rate and resolution on compression efficiency and viewers‟ satisfaction are also presented. Results showed that the compression process and perceptual satisfaction are severely affected by the nature of the compressed sequence. As a result, H.264 showed higher compression efficiency for synthetic sequences and outperformed other codecs in the subjective evaluation tests. Second, a fast inter prediction technique to speed up the encoding process of H.264 has been devised. The on-line game streaming service is a real time application, thus, compression complexity significantly affects the whole process of on-line streaming. H.264 has been recommended for synthetic video coding by our results gained in codecs comparative studies. However, it still suffers from high encoding complexity; thus a low complexity coding algorithm is presented as fast inter coding model with reference management technique. The proposed algorithm was compared to a state of the art method, the results showing better achievement in time and bit rate reduction with negligible loss of fidelity. Third, recommendations on tradeoff between frame rates and resolution within given uplink capabilities are provided for H.264 video coding. The recommended tradeoffs are offered as a result of extensive experiments using Double Stimulus Impairment Scale (DSIS) subjective evaluation metric. Experiments showed that viewers‟ satisfaction is profoundly affected by varying frame rates and resolutions. In addition, increasing frame rate or frame resolution does not always guarantee improved increments of perceptual quality. As a result, tradeoffs are recommended to compromise between frame rate and resolution within a given bit rate to guarantee the highest user satisfaction. For system completeness and to facilitate the implementation of the proposed techniques, an efficient game video streaming management system is proposed. Compared to existing on-line live video service systems for games, the proposed system provides improved coding efficiency, complexity reduction and better user satisfaction

    Macroblock level rate and distortion estimation applied to the computation of the Lagrange multiplier in H.264 compression

    Get PDF
    The optimal value of Lagrange multiplier, a trade-off factor between the conveyed rate and distortion measured at the signal reconstruction has been a fundamental problem of rate distortion theory and video compression in particular. The H.264 standard does not specify how to determine the optimal combination of the quantization parameter (QP) values and encoding choices (motion vectors, mode decision). So far, the encoding process is still subject to the static value of Lagrange multiplier, having an exponential dependence on QP as adopted by the scientific community. However, this static value cannot accommodate the diversity of video sequences. Determining its optimal value is still a challenge for current research. In this thesis, we propose a novel algorithm that dynamically adapts the Lagrange multiplier to the video input by using the distribution of the transformed residuals at the macroblock level, expected to result in an improved compression performance in the rate-distortion space. We apply several models to the transformed residuals (Laplace, Gaussian, generic probability density function) at the macroblock level to estimate the rate and distortion, and study how well they fit the actual values. We then analyze the benefits and drawbacks of a few simple models (Laplace and a mixture of Laplace and Gaussian) from the standpoint of acquired compression gain versus visual improvement in connection to the H.264 standard. Rather than computing the Lagrange multiplier based on a model applied to the whole frame, as proposed in the state-of-the-art, we compute it based on models applied at the macroblock level. The new algorithm estimates, from the macroblock’s transformed residuals, its rate and distortion and then combines the contribution of each to compute the frame’s Lagrange multiplier. The experiments on various types of videos showed that the distortion calculated at the macroblock level approaches the real one delivered by the reference software for most sequences tested, although a reliable rate model is still lacking especially at low bit rate. Nevertheless, the results obtained from compressing various video sequences show that the proposed method performs significantly better than the H.264 Joint Model and is slightly better than state-of-the-art methods

    Efficient Motion Estimation and Mode Decision Algorithms for Advanced Video Coding

    Get PDF
    H.264/AVC video compression standard achieved significant improvements in coding efficiency, but the computational complexity of the H.264/AVC encoder is drastically high. The main complexity of encoder comes from variable block size motion estimation (ME) and rate-distortion optimized (RDO) mode decision methods. This dissertation proposes three different methods to reduce computation of motion estimation. Firstly, the computation of each distortion measure is reduced by proposing a novel two step edge based partial distortion search (TS-EPDS) algorithm. In this algorithm, the entire macroblock is divided into different sub-blocks and the calculation order of partial distortion is determined based on the edge strength of the sub-blocks. Secondly, we have developed an early termination algorithm that features an adaptive threshold based on the statistical characteristics of rate-distortion (RD) cost regarding current block and previously processed blocks and modes. Thirdly, this dissertation presents a novel adaptive search area selection method by utilizing the information of the previously computed motion vector differences (MVDs). In H.264/AVC intra coding, DC mode is used to predict regions with no unified direction and the predicted pixel values are same and thus smooth varying regions are not well de-correlated. This dissertation proposes an improved DC prediction (IDCP) mode based on the distance between the predicted and reference pixels. On the other hand, using the nine prediction modes in intra 4x4 and 8x8 block units needs a lot of overhead bits. In order to reduce the number of overhead bits, an intra mode bit rate reduction method is suggested. This dissertation also proposes an enhanced algorithm to estimate the most probable mode (MPM) of each block. The MPM is derived from the prediction mode direction of neighboring blocks which have different weights according to their positions. This dissertation also suggests a fast enhanced cost function for mode decision of intra encoder. The enhanced cost function uses sum of absolute Hadamard-transformed differences (SATD) and mean absolute deviation of the residual block to estimate distortion part of the cost function. A threshold based large coefficients count is also used for estimating the bit-rate part

    Low-complexity high prediction accuracy visual quality metrics and their applications in H.264/AVC encoding mode decision process

    Get PDF
    In this thesis, we develop a new general framework for computing full reference image quality scores in the discrete wavelet domain using the Haar wavelet. The proposed framework presents an excellent tradeoff between accuracy and complexity. In our framework, quality metrics are categorized as either map-based, which generate a quality (distortion) map to be pooled for the final score, e.g., structural similarity (SSIM), or non map-based, which only give a final score, e.g., Peak signal-to-noise ratio (PSNR). For mapbased metrics, the proposed framework defines a contrast map in the wavelet domain for pooling the quality maps. We also derive a formula to enable the framework to automatically calculate the appropriate level of wavelet decomposition for error-based metrics at a desired viewing distance. To consider the effect of very fine image details in quality assessment, the proposed method defines a multi-level edge map for each image, which comprises only the most informative image subbands. To clarify the application of the framework in computing quality scores, we give some examples showing how the framework can be applied to improve well-known metrics such as SSIM, visual information fidelity (VIF), PSNR, and absolute difference. We compare the complexity of various algorithms obtained by the framework to the Intel IPP-based H.264 baseline profile encoding using C/C++ implementations. We evaluate the overall performance of the proposed metrics, including their prediction accuracy, on two well-known image quality databases and one video quality database. All the simulation results confirm the efficiency of the proposed framework and quality assessment metrics in improving the prediction accuracy and also reduction of the computational complexity. For example, by using the framework, we can compute the VIF at about 5% of the complexity of its original version, but with higher accuracy. In the next step, we study how H.264 coding mode decision can benefit from our developed metrics. We integrate the proposed SSEA metric as the distortion measure inside the H.264 mode decision process. The H.264/AVC JM reference software is used as the implementation and verification platform. We propose a search algorithm to determine the Lagrange multiplier value for each quantization parameter (QP). The search is applied on three different types of video sequences having various motion activity features, and the resulting Lagrange multiplier values are tabulated for each of them. Based on our proposed Framework we propose a new quality metric PSNRA, and use it in this part (mode decision). The simulated rate-distortion (RD) curves show that at the same PSNRA, with the SSEA-based mode decision, the bitrate is reduced about 5% on average compared to the conventional SSE-based approach for the sequences with low and medium motion activities. It is notable that the computational complexity is not increased at all by using the proposed SSEA-based approach instead of the conventional SSE-based method. Therefore, the proposed mode decision algorithm can be used in real-time video coding

    SSIM-Inspired Quality Assessment, Compression, and Processing for Visual Communications

    Get PDF
    Objective Image and Video Quality Assessment (I/VQA) measures predict image/video quality as perceived by human beings - the ultimate consumers of visual data. Existing research in the area is mainly limited to benchmarking and monitoring of visual data. The use of I/VQA measures in the design and optimization of image/video processing algorithms and systems is more desirable, challenging and fruitful but has not been well explored. Among the recently proposed objective I/VQA approaches, the structural similarity (SSIM) index and its variants have emerged as promising measures that show superior performance as compared to the widely used mean squared error (MSE) and are computationally simple compared with other state-of-the-art perceptual quality measures. In addition, SSIM has a number of desirable mathematical properties for optimization tasks. The goal of this research is to break the tradition of using MSE as the optimization criterion for image and video processing algorithms. We tackle several important problems in visual communication applications by exploiting SSIM-inspired design and optimization to achieve significantly better performance. Firstly, the original SSIM is a Full-Reference IQA (FR-IQA) measure that requires access to the original reference image, making it impractical in many visual communication applications. We propose a general purpose Reduced-Reference IQA (RR-IQA) method that can estimate SSIM with high accuracy with the help of a small number of RR features extracted from the original image. Furthermore, we introduce and demonstrate the novel idea of partially repairing an image using RR features. Secondly, image processing algorithms such as image de-noising and image super-resolution are required at various stages of visual communication systems, starting from image acquisition to image display at the receiver. We incorporate SSIM into the framework of sparse signal representation and non-local means methods and demonstrate improved performance in image de-noising and super-resolution. Thirdly, we incorporate SSIM into the framework of perceptual video compression. We propose an SSIM-based rate-distortion optimization scheme and an SSIM-inspired divisive optimization method that transforms the DCT domain frame residuals to a perceptually uniform space. Both approaches demonstrate the potential to largely improve the rate-distortion performance of state-of-the-art video codecs. Finally, in real-world visual communications, it is a common experience that end-users receive video with significantly time-varying quality due to the variations in video content/complexity, codec configuration, and network conditions. How human visual quality of experience (QoE) changes with such time-varying video quality is not yet well-understood. We propose a quality adaptation model that is asymmetrically tuned to increasing and decreasing quality. The model improves upon the direct SSIM approach in predicting subjective perceptual experience of time-varying video quality

    Novel Statistical Modeling, Analysis and Implementation of Rate-Distortion Estimation for H.264/AVC Coders

    No full text
    corecore