20 research outputs found

    Key-point Detection based Fast CU Decision for HEVC Intra Encoding

    Get PDF
    As the most recent video coding standard, High Efficiency Video Coding (HEVC) adopts various novel techniques, including a quad-tree based coding unit (CU) structure and additional angular modes used for intra encoding. These newtechniques achieve a notable improvement in coding efficiency at the penalty of significant computational complexity increase. Thus, a fast HEVC coding algorithm is highly desirable. In this paper, we propose a fast intra CU decision algorithm for HEVC to reduce the coding complexity, mainly based on a key-point detection. A CU block is considered to have multiple gradients and is early split if corner points are detected inside the block. On the other hand, a CU block without corner points is treated to be terminated when its RD cost is also small according to statistics of the previous frames. The proposed fast algorithm achieves over 62% encoding time reduction with 3.66%, 2.82%, and 2.53% BD-Rate loss for Y, U, and V components, averagely. The experimental results show that the proposed method is efficient to fast decide CU size in HEVC intra coding, even though only static parameters are applied to all test sequences

    Quality of Experience (QoE)-Aware Fast Coding Unit Size Selection for HEVC Intra-prediction

    Get PDF
    The exorbitant increase in the computational complexity of modern video coding standards, such as High Efficiency Video Coding (HEVC), is a compelling challenge for resource-constrained consumer electronic devices. For instance, the brute force evaluation of all possible combinations of available coding modes and quadtree-based coding structure in HEVC to determine the optimum set of coding parameters for a given content demand a substantial amount of computational and energy resources. Thus, the resource requirements for real time operation of HEVC has become a contributing factor towards the Quality of Experience (QoE) of the end users of emerging multimedia and future internet applications. In this context, this paper proposes a content-adaptive Coding Unit (CU) size selection algorithm for HEVC intra-prediction. The proposed algorithm builds content-specific weighted Support Vector Machine (SVM) models in real time during the encoding process, to provide an early estimate of CU size for a given content, avoiding the brute force evaluation of all possible coding mode combinations in HEVC. The experimental results demonstrate an average encoding time reduction of 52.38%, with an average Bjøntegaard Delta Bit Rate (BDBR) increase of 1.19% compared to the HM16.1 reference encoder. Furthermore, the perceptual visual quality assessments conducted through Video Quality Metric (VQM) show minimal visual quality impact on the reconstructed videos of the proposed algorithm compared to state-of-the-art approaches

    CTU Depth Decision Algorithms for HEVC: A Survey

    Get PDF
    High-Efficiency Video Coding (HEVC) surpasses its predecessors in encoding efficiency by introducing new coding tools at the cost of an increased encoding time-complexity. The Coding Tree Unit (CTU) is the main building block used in HEVC. In the HEVC standard, frames are divided into CTUs with the predetermined size of up to 64x64 pixels. Each CTU is then divided recursively into a number of equally sized square areas, known as Coding Units (CUs). Although this diversity of frame partitioning increases encoding efficiency, it also causes an increase in the time complexity due to the increased number of ways to find the optimal partitioning. To address this complexity, numerous algorithms have been proposed to eliminate unnecessary searches during partitioning CTUs by exploiting the correlation in the video. In this paper, existing CTU depth decision algorithms for HEVC are surveyed. These algorithms are categorized into two groups, namely statistics and machine learning approaches. Statistics approaches are further subdivided into neighboring and inherent approaches. Neighboring approaches exploit the similarity between adjacent CTUs to limit the depth range of the current CTU, while inherent approaches use only the available information within the current CTU. Machine learning approaches try to extract and exploit similarities implicitly. Traditional methods like support vector machines or random forests use manually selected features, while recently proposed deep learning methods extract features during training. Finally, this paper discusses extending these methods to more recent video coding formats such as Versatile Video Coding (VVC) and AOMedia Video 1(AV1)

    Efficient coding unit size selection based on texture analysis for HEVC intra prediction

    Get PDF
    Determining the best partitioning structure for a given Coding Tree Unit (CTU) is one of the most time consuming operations within the HEVC encoder. The brute force search through quadtree hierarchy has a significant impact on the encoding time of high definition (HD) videos. This paper presents a fast coding unit size decision-taking algorithm for intra prediction in HEVC. The proposed algorithm utilizes a low complex texture analysis technique based on the local range property of a pixel in a given neighborhood. Simulation results show that the proposed algorithm achieves an average of 72.24% encoding time efficiency improvement with similar rate distortion performance compared to HEVC reference software HM12.0 for HD videos

    Fast Intra-frame Coding Algorithm for HEVC Based on TCM and Machine Learning

    Get PDF
    High Efficiency Video Coding (HEVC) is the latest video coding standard. Compared with the previous standard H.264/AVC, it can reduce the bit-rate by around 50% while maintaining the same perceptual quality. This performance gain on compression is achieved mainly by supporting larger Coding Unit (CU) size and more prediction modes. However, since the encoder needs to traverse all possible choices to mine out the best way of encoding data, this large flexibility on block size and prediction modes has caused a tremendous increase in encoding time. In HEVC, intra-frame coding is an important basis, and it is widely used in all configurations. Therefore, fast algorithms are always required to alleviate the computational complexity of HEVC intra-frame coding. In this thesis, a fast intra-frame coding algorithm based on machine learning is proposed to predict CU decisions. Hence the computational complexity can be significantly reduced with negligible loss in the coding efficiency. Machine learning models like Bayes decision, Support Vector Machine (SVM) are used as decision makers while the Laplacian Transparent Composite Model (LPTCM) is selected as a feature extraction tool. In the main version of the proposed algorithm, a set of features named with Summation of Binarized Outlier Coefficients (SBOC) is extracted to train SVM models. An online training structure and a performance control method are introduced to enhance the robustness of decision makers. When applied on All Intra Main (AIM) full test and compared with HM 16.3, the main version of the proposed algorithm can achieve, on average, 48% time reduction with 0.78% BD-rate increase. Through adjusting parameter settings, the algorithm can change the trade-off between encoding time and coding efficiency, which can generate a performance curve to meet different requirements. By testing different methods on the same machine, the performance of proposed method has outperformed all CU decision based HEVC fast intra-frame algorithms in the benchmarks

    Fast Intra-frame Coding Algorithm for HEVC Based on TCM and Machine Learning

    Get PDF
    High Efficiency Video Coding (HEVC) is the latest video coding standard. Compared with the previous standard H.264/AVC, it can reduce the bit-rate by around 50% while maintaining the same perceptual quality. This performance gain on compression is achieved mainly by supporting larger Coding Unit (CU) size and more prediction modes. However, since the encoder needs to traverse all possible choices to mine out the best way of encoding data, this large flexibility on block size and prediction modes has caused a tremendous increase in encoding time. In HEVC, intra-frame coding is an important basis, and it is widely used in all configurations. Therefore, fast algorithms are always required to alleviate the computational complexity of HEVC intra-frame coding. In this thesis, a fast intra-frame coding algorithm based on machine learning is proposed to predict CU decisions. Hence the computational complexity can be significantly reduced with negligible loss in the coding efficiency. Machine learning models like Bayes decision, Support Vector Machine (SVM) are used as decision makers while the Laplacian Transparent Composite Model (LPTCM) is selected as a feature extraction tool. In the main version of the proposed algorithm, a set of features named with Summation of Binarized Outlier Coefficients (SBOC) is extracted to train SVM models. An online training structure and a performance control method are introduced to enhance the robustness of decision makers. When applied on All Intra Main (AIM) full test and compared with HM 16.3, the main version of the proposed algorithm can achieve, on average, 48% time reduction with 0.78% BD-rate increase. Through adjusting parameter settings, the algorithm can change the trade-off between encoding time and coding efficiency, which can generate a performance curve to meet different requirements. By testing different methods on the same machine, the performance of proposed method has outperformed all CU decision based HEVC fast intra-frame algorithms in the benchmarks

    CNN-based Prediction of Partition Path for VVC Fast Inter Partitioning Using Motion Fields

    Full text link
    The Versatile Video Coding (VVC) standard has been recently finalized by the Joint Video Exploration Team (JVET). Compared to the High Efficiency Video Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in encoding complexity. In this paper, we propose a method based on Convolutional Neural Network (CNN) to speed up the inter partitioning process in VVC. Firstly, a novel representation for the quadtree with nested multi-type tree (QTMT) partition is introduced, derived from the partition path. Secondly, we develop a U-Net-based CNN taking a multi-scale motion vector field as input at the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict the optimal partition path during the Rate-Distortion Optimization (RDO) process. To achieve this, we divide CTU into grids and predict the Quaternary Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the grid. Thirdly, an efficient partition pruning algorithm is introduced to employ the CNN predictions at each partitioning level to skip RDO evaluations of unnecessary partition paths. Finally, an adaptive threshold selection scheme is designed, making the trade-off between complexity and efficiency scalable. Experiments show that the proposed method can achieve acceleration ranging from 16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32) configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in terms of BD-rate, which surpasses other state-of-the-art solutions. Additionally, our method stands out as one of the lightest approaches in the field, which ensures its applicability to other encoders

    Learned-based Intra Coding Tools for Video Compression.

    Get PDF
    PhD Theses.The increase in demand for video rendering in 4K and beyond displays, as well as immersive video formats, requires the use of e cient compression techniques. In this thesis novel methods for enhancing the e ciency of current and next generation video codecs are investigated. Several aspects that in uence the way conventional video coding methods work are considered. The methods proposed in this thesis utilise Neural Networks (NNs) trained for regression tasks in order to predict data. In particular, Convolutional Neural Networks (CNNs) are used to predict Rate-Distortion (RD) data for intra-coded frames. Moreover, a novel intra-prediction methods are proposed with the aim of providing new ways to exploit redundancies overlooked by traditional intraprediction tools. Additionally, it is shown how such methods can be simpli ed in order to derive less resource-demanding tools

    3D Medical Image Lossless Compressor Using Deep Learning Approaches

    Get PDF
    The ever-increasing importance of accelerated information processing, communica-tion, and storing are major requirements within the big-data era revolution. With the extensive rise in data availability, handy information acquisition, and growing data rate, a critical challenge emerges in efficient handling. Even with advanced technical hardware developments and multiple Graphics Processing Units (GPUs) availability, this demand is still highly promoted to utilise these technologies effectively. Health-care systems are one of the domains yielding explosive data growth. Especially when considering their modern scanners abilities, which annually produce higher-resolution and more densely sampled medical images, with increasing requirements for massive storage capacity. The bottleneck in data transmission and storage would essentially be handled with an effective compression method. Since medical information is critical and imposes an influential role in diagnosis accuracy, it is strongly encouraged to guarantee exact reconstruction with no loss in quality, which is the main objective of any lossless compression algorithm. Given the revolutionary impact of Deep Learning (DL) methods in solving many tasks while achieving the state of the art results, includ-ing data compression, this opens tremendous opportunities for contributions. While considerable efforts have been made to address lossy performance using learning-based approaches, less attention was paid to address lossless compression. This PhD thesis investigates and proposes novel learning-based approaches for compressing 3D medical images losslessly.Firstly, we formulate the lossless compression task as a supervised sequential prediction problem, whereby a model learns a projection function to predict a target voxel given sequence of samples from its spatially surrounding voxels. Using such 3D local sampling information efficiently exploits spatial similarities and redundancies in a volumetric medical context by utilising such a prediction paradigm. The proposed NN-based data predictor is trained to minimise the differences with the original data values while the residual errors are encoded using arithmetic coding to allow lossless reconstruction.Following this, we explore the effectiveness of Recurrent Neural Networks (RNNs) as a 3D predictor for learning the mapping function from the spatial medical domain (16 bit-depths). We analyse Long Short-Term Memory (LSTM) models’ generalisabil-ity and robustness in capturing the 3D spatial dependencies of a voxel’s neighbourhood while utilising samples taken from various scanning settings. We evaluate our proposed MedZip models in compressing unseen Computerized Tomography (CT) and Magnetic Resonance Imaging (MRI) modalities losslessly, compared to other state-of-the-art lossless compression standards.This work investigates input configurations and sampling schemes for a many-to-one sequence prediction model, specifically for compressing 3D medical images (16 bit-depths) losslessly. The main objective is to determine the optimal practice for enabling the proposed LSTM model to achieve a high compression ratio and fast encoding-decoding performance. A solution for a non-deterministic environments problem was also proposed, allowing models to run in parallel form without much compression performance drop. Compared to well-known lossless codecs, experimental evaluations were carried out on datasets acquired by different hospitals, representing different body segments, and have distinct scanning modalities (i.e. CT and MRI).To conclude, we present a novel data-driven sampling scheme utilising weighted gradient scores for training LSTM prediction-based models. The objective is to determine whether some training samples are significantly more informative than others, specifically in medical domains where samples are available on a scale of billions. The effectiveness of models trained on the presented importance sampling scheme was evaluated compared to alternative strategies such as uniform, Gaussian, and sliced-based sampling

    Moving object detection for automobiles by the shared use of H.264/AVC motion vectors : innovation report.

    Get PDF
    Cost is one of the problems for wider adoption of Advanced Driver Assistance Systems (ADAS) in China. The objective of this research project is to develop a low-cost ADAS by the shared use of motion vectors (MVs) from a H.264/AVC video encoder that was originally designed for video recording only. There were few studies on the use of MVs from video encoders on a moving platform for moving object detection. The main contribution of this research is the novel algorithm proposed to address the problems of moving object detection when MVs from a H.264/AVC encoder are used. It is suitable for mass-produced in-vehicle devices as it combines with MV based moving object detection in order to reduce the cost and complexity of the system, and provides the recording function by default without extra cost. The estimated cost of the proposed system is 50% lower than that making use of the optical flow approach. To reduce the area of region of interest and to account for the real-time computation requirement, a new block based region growth algorithm is used for the road region detection. To account for the small amplitude and limited precision of H.264/AVC MVs on relatively slow moving objects, the detection task separates the region of interest into relatively fast and relatively slow speed regions by examining the amplitude of MVs, the position of focus of expansion and the result of road region detection. Relatively slow moving objects are detected and tracked by the use of generic horizontal and vertical contours of rear-view vehicles. This method has addressed the problem of H.264/AVC encoders that possess limited precision and erroneous motion vectors for relatively slow moving objects and regions near the focus of expansion. Relatively fast moving objects are detected by a two-stage approach. It includes a Hypothesis Generation (HG) and a Hypothesis Verification (HV) stage. This approach addresses the problem that the H.264/AVC MVs are generated for coding efficiency rather than for minimising motion error of objects. The HG stage will report a potential moving object based on clustering the planar parallax residuals satisfying the constraints set out in the algorithm. The HV will verify the existence of the moving object based on the temporal consistency of its displacement in successive frames. The test results show that the vehicle detection rate higher than 90% which is on a par to methods proposed by other authors, and the computation cost is low enough to achieve the real-time performance requirement. An invention patent, one international journal paper and two international conference papers have been either published or accepted, showing the originality of the work in this project. One international journal paper is also under preparation