31 research outputs found

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    Contributions to the solution of the rate-distorsion optimization problem in video coding

    Get PDF
    In the last two decades, we have witnessed significant changes concerning the demand of video codecs. The diversity of services has significantly increased, high definition (HD) and beyond-HD resolutions have become a reality, the video traffic coming from mobile devices and tablets is increasing, the video-on-demand services are now playing a prominent role, and so on. All of these advances have converged to demand more powerful standard video codecs, the more recent ones being the H.264/Advanced Video Coding (H.264/AVC) and the latest High Efficiency Video Coding (HEVC), both generated by the Joint Collaborative Team on Video Coding (JCT-VC), a partnership of the ITU-T Video Coding Expert Group (VCEG) and the ISO/IED Moving Picture Expert Group (MEPG). These two standards (and many others starting with the ITU-T H.261) rely on a hybrid model known as Differential Pulse Code Modulation (DPCM)/Discrete Cosine Transform (DCT) hybrid video coder, which involves a motion estimation and compensation phase followed by a transformation and quantization stages and an entropy coder. Moreover, each of these main subsystems is made of a number of interdependent and parametric modules that can be adapted to the particular video content. The main problem arising from this approach is how to choose as best as possible the combination of the different parametrizations to achieve the most efficient coding of the current content. To solve this problem, one of the solutions proposed (and the one adopted in both the H.264/AVC and the HEVC reference encoder implementations) is the process referred to as rate-distortion optimization, which chooses a parametrization of the encoder based on the minimization of a cost function that considers the trade-off between rate and distortion, weighted by a Lagrange multiplier (��) which has been empirically obtained for both the H.264/AVC and the HEVC reference encoder implementations, aiming to provide a robust solution for a variety of video contents. In this PhD. thesis, an exhaustive study of the influence of this Lagrangian parameter on different video sequences reveals that there are some common features that appear frequently in video sequences for which the adopted �� model (the reference model) becomes ineffective. Furthermore, we have found a notable margin of improvement in the coding efficiency of both coders when using a more adequate model for the Lagrangian parameter. Thus, contributions of this thesis are the following: (i) to prove that the reference Lagrangian model becomes ineffective in certain common situations; and (ii), propose generalized solutions to improve the robustness of the reference model, both for the H.264/AVC and the HEVC standards, obtaining important improvements in the coding efficiency. In both proposals, changes in the nature over the video sequence are taken into account, proposing models that adaptively consider the video content and minimize the increment in computational complexity.En las últimas dos décadas hemos sido testigos de importantes cambios en la demanda de codificadores de vídeo debido a múltiples factores: la diversidad de servicios se ha visto incrementada significativamente, la resolución high definition (HD) (e incluso mayores) se ha hecho realidad, el tráfico de vídeo procedente de dispositivos móviles y tabletas está aumentando y los servicios de vídeo bajo demanda son cada vez más comunes, entre otros muchos ejemplos. Todos estos avances convergen en la demanda de estándares de codificación de vídeo más potentes, siendo los más importantes el H.264/Advanced Video Coding (AVC) y el más reciente High Efficiency Video Coding (HEVC), ambos definidos por el Joint Collaborative Team on Video Coding (JCT-VC), una colaboraci´on entre el ITU-T Video Coding Expert Group (VCEG) y el ISO/IED Moving Picture Expert Group (MPEG). Estos dos estándares (y otros muchos, empezando con el ITU-T H.261) se basan en un modelo híbrido de codificador conocido como Differential Pulse Code Modulation (DPCM)/Discrete Cosine Transform (DCT), que está formado por una estimación y compensación de movimiento seguida de una etapa de transformación y cuantificación y un codificador entrópico. Además, cada uno de estos subsistemas está formado por un cierto número de módulos interdependientes y paramétricos que pueden adaptarse al contenido específico de cada secuencia de vídeo. El principal problema que surge de esta aproximación es cómo elegir de la forma más adecuada la combinación de las distintas parametrizaciones con el objetivo de alcanzar la codificación más eficiente posible del contenido que se está procesando. Para resolver este problema, una de las soluciones propuestas es el proceso conocido como optimización tasa-distorsión, que se encarga de elegir una parametrización para el codificador basada en la minimización de una función de coste que considera el compromiso existente entre la tasa y la distorsión, ponderado por un multiplicador de Lagrange (�) que ha sido obtenido de forma empírica para las implementaciones de referencia del codificador tanto del estándar H.264/AVC como del estándar HEVC, con el objetivo de proponer una solución robusta para distintos tipos de contenidos de vídeo. En esta tesis doctoral, un estudio exhaustivo de la influencia de este parámetro lagrangiano en distintas secuencias de vídeo revela que existen algunas características comunes que aparecen frecuentemente en secuencias de vídeo para las que el modelo � adoptado en las implementaciones de referencia resulta poco efectivo. Además, hemos encontrado un notable margen de mejora en la eficiencia de codificación de ambos codificadores usando un modelo más adecuado para este parámetro lagrangiano. Por consiguiente, las contribuciones de esta tesis son las que siguen: (i) probar que el modelo lagrangiano de referencia resulta inefectivo bajo ciertas situaciones comunes; y (ii), proponer soluciones generalizadas para mejorar la robustez del modelo de referencia, tanto en el caso de H.264/AVC como en el de HEVC, obteniendo mejoras importantes en eficiencia de codificación. En ambas propuestas se tienen en cuenta los cambios en la naturaleza del contenido de una secuencia de vídeo proponiendo modelos que se adaptan dinámicamente a dicho contenido variable y que tienen en cuenta el incremento en la complejidad computacional del codificador.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: José Prades Nebot.- Secretario: Carmen Peláez Moreno.- Vocal: Julián Cabrera Quesad

    Efficient video coding using visual sensitive information for HEVC coding standard

    Get PDF
    The latest high efficiency video coding (HEVC) standard introduces a large number of inter-mode block partitioning modes. The HEVC reference test model (HM) uses partially exhaustive tree-structured mode selection, which still explores a large number of prediction unit (PU) modes for a coding unit (CU). This impacts on encoding time rise which deprives a number of electronic devices having limited processing resources to use various features of HEVC. By analyzing the homogeneity, residual, and different statistical correlation among modes, many researchers speed-up the encoding process through the number of PU mode reduction. However, these approaches could not demonstrate the similar rate-distortion (RD) performance with the HM due to their dependency on existing Lagrangian cost function (LCF) within the HEVC framework. In this paper, to avoid the complete dependency on LCF in the initial phase, we exploit visual sensitive foreground motion and spatial salient metric (FMSSM) in a block. To capture its motion and saliency features, we use the dynamic background and visual saliency modeling, respectively. According to the FMSSM values, a subset of PU modes is then explored for encoding the CU. This preprocessing phase is independent from the existing LCF. As the proposed coding technique further reduces the number of PU modes using two simple criteria (i.e., motion and saliency), it outperforms the HM in terms of encoding time reduction. As it also encodes the uncovered and static background areas using the dynamic background frame as a substituted reference frame, it does not sacrifice quality. Tested results reveal that the proposed method achieves 32% average encoding time reduction of the HM without any quality loss for a wide range of videos

    Error resilience and concealment techniques for high-efficiency video coding

    Get PDF
    This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

    Improved depth coding for HEVC focusing on depth edge approximation

    Get PDF
    The latest High Efficiency Video Coding (HEVC) standard has greatly improved the coding efficiency compared to its predecessor H.264. An important share of which is the adoption of hierarchical block partitioning structures and an extended number of modes. The structure of existing inter-modes is appropriate mainly to handle the rectangular and square aligned motion patterns. However, they could not be suitable for the block partitioning of depth objects having partial foreground motion with irregular edges and background. In such cases, the HEVC reference test model (HM) normally explores finer level block partitioning that requires more bits and encoding time to compensate large residuals. Since motion detection is the underlying criteria for mode selection, in this work, we use the energy concentration ratio feature of phase correlation to capture different types of motion in depth object. For better motion modeling focusing at depth edges, the proposed technique also uses an extra pattern mode comprising a group of templates with various rectangular and non-rectangular object shapes and edges. As the pattern mode could save bits by encoding only the foreground areas and beat all other inter-modes in a block once selected, the proposed technique could improve the rate-distortion performance. It could also reduce encoding time by skipping further branching using the pattern mode and selecting a subset of modes using innovative pre-processing criteria. Experimentally it could save 29% average encoding time and improve 0.10 dB Bjontegaard Delta peak signal-to-noise ratio compared to the HM

    Weighted Combination of Sample Based and Block Based Intra Prediction in Video Coding

    Get PDF
    The latest standard within video compression, HEVC/H.265, was released during 2013 and provides a significant improvement from its predecessor AVC/H.264. However, with a constantly increasing demand for high denition video and streaming of large video files, there are still improvements that can be done. Difficult content in video sequences, for example smoke, leaves and water that moves irregularly, is being hard to predict and can be troublesome at the prediction stage in the video compression. In this thesis, carried out at Ericsson in Stockholm, the combination of sample based intra prediction (SBIP) and block based intra prediction (BBIP) is tested to see if it could improve the prediction of video sequences containing difficult content, here focusing on water. The combined methods are compared to HEVC intra prediction. All implementations have been done in Matlab. The results show that a combination reduces the Mean Squared Error (MSE) as well as could improve the Visual Information Fidelity (VIF) and the mean Structural Similarity (MSSIM). Moreover the visual quality was improved by more details and less blocking artefacts
    corecore