38 research outputs found

    Algorithms and Hardware Co-Design of HEVC Intra Encoders

    Get PDF
    Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

    System-on-Chip design of a high performance low power full hardware cabac encoder in H.264/AVC

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Efficient algorithms for scalable video coding

    Get PDF
    A scalable video bitstream specifically designed for the needs of various client terminals, network conditions, and user demands is much desired in current and future video transmission and storage systems. The scalable extension of the H.264/AVC standard (SVC) has been developed to satisfy the new challenges posed by heterogeneous environments, as it permits a single video stream to be decoded fully or partially with variable quality, resolution, and frame rate in order to adapt to a specific application. This thesis presents novel improved algorithms for SVC, including: 1) a fast inter-frame and inter-layer coding mode selection algorithm based on motion activity; 2) a hierarchical fast mode selection algorithm; 3) a two-part Rate Distortion (RD) model targeting the properties of different prediction modes for the SVC rate control scheme; and 4) an optimised Mean Absolute Difference (MAD) prediction model. The proposed fast inter-frame and inter-layer mode selection algorithm is based on the empirical observation that a macroblock (MB) with slow movement is more likely to be best matched by one in the same resolution layer. However, for a macroblock with fast movement, motion estimation between layers is required. Simulation results show that the algorithm can reduce the encoding time by up to 40%, with negligible degradation in RD performance. The proposed hierarchical fast mode selection scheme comprises four levels and makes full use of inter-layer, temporal and spatial correlation aswell as the texture information of each macroblock. Overall, the new technique demonstrates the same coding performance in terms of picture quality and compression ratio as that of the SVC standard, yet produces a saving in encoding time of up to 84%. Compared with state-of-the-art SVC fast mode selection algorithms, the proposed algorithm achieves a superior computational time reduction under very similar RD performance conditions. The existing SVC rate distortion model cannot accurately represent the RD properties of the prediction modes, because it is influenced by the use of inter-layer prediction. A separate RD model for inter-layer prediction coding in the enhancement layer(s) is therefore introduced. Overall, the proposed algorithms improve the average PSNR by up to 0.34dB or produce an average saving in bit rate of up to 7.78%. Furthermore, the control accuracy is maintained to within 0.07% on average. As aMADprediction error always exists and cannot be avoided, an optimisedMADprediction model for the spatial enhancement layers is proposed that considers the MAD from previous temporal frames and previous spatial frames together, to achieve a more accurateMADprediction. Simulation results indicate that the proposedMADprediction model reduces the MAD prediction error by up to 79% compared with the JVT-W043 implementation

    Video Stream Adaptation In Computer Vision Systems

    Get PDF
    Computer Vision (CV) has been deployed recently in a wide range of applications, including surveillance and automotive industries. According to a recent report, the market for CV technologies will grow to $33.3 billion by 2019. Surveillance and automotive industries share over 20% of this market. This dissertation considers the design of real-time CV systems with live video streaming, especially those over wireless and mobile networks. Such systems include video cameras/sensors and monitoring stations. The cameras should adapt their captured videos based on the events and/or available resources and time requirement. The monitoring station receives video streams from all cameras and run CV algorithms for decisions, warnings, control, and/or other actions. Real-time CV systems have constraints in power, computational, and communicational resources. Most video adaptation techniques considered the video distortion as the primary metric. In CV systems, however, the main objective is enhancing the event/object detection/recognition/tracking accuracy. The accuracy can essentially be thought of as the quality perceived by machines, as opposed to the human perceptual quality. High-Efficiency Video Coding (HEVC) is a recent encoding standard that seeks to address the limited communication bandwidth problem as a result of the popularity of High Definition (HD) videos. Unfortunately, HEVC adopts algorithms that greatly slow down the encoding process, and thus results in complications in real-time systems. This dissertation presents a method for adapting live video streams to limited and varying network bandwidth and energy resources. It analyzes and compares the rate-accuracy and rate-energy characteristics of various video streams adaptation techniques in CV systems. We model the video capturing, encoding, and transmission aspects and then provide an overall model of the power consumed by the video cameras and/or sensors. In addition to modeling the power consumption, we model the achieved bitrate of video encoding. We validate and analyze the power consumption models of each phase as well as the aggregate power consumption model through extensive experiments. The analysis includes examining individual parameters separately and examining the impacts of changing more than one parameter at a time. For HEVC, we develop an algorithm that predicts the size of the block without iterating through the exhaustive Rate Distortion Optimization (RDO) method. We demonstrate the effectiveness of the proposed algorithm in comparison with existing algorithms. The proposed algorithm achieves approximately 5 times the encoding speed of the RDO algorithm and 1.42 times the encoding speed of the fastest analyzed algorithm

    Architectures for Adaptive Low-Power Embedded Multimedia Systems

    Get PDF
    This Ph.D. thesis describes novel hardware/software architectures for adaptive low-power embedded multimedia systems. Novel techniques for run-time adaptive energy management are proposed, such that both HW & SW adapt together to react to the unpredictable scenarios. A complete power-aware H.264 video encoder was developed. Comparison with state-of-the-art demonstrates significant energy savings while meeting the performance constraint and keeping the video quality degradation unnoticeable

    Nouvelles méthodes de prédiction inter-images pour la compression d’images et de vidéos

    Get PDF
    Due to the large availability of video cameras and new social media practices, as well as the emergence of cloud services, images and videosconstitute today a significant amount of the total data that is transmitted over the internet. Video streaming applications account for more than 70% of the world internet bandwidth. Whereas billions of images are already stored in the cloud and millions are uploaded every day. The ever growing streaming and storage requirements of these media require the constant improvements of image and video coding tools. This thesis aims at exploring novel approaches for improving current inter-prediction methods. Such methods leverage redundancies between similar frames, and were originally developed in the context of video compression. In a first approach, novel global and local inter-prediction tools are associated to improve the efficiency of image sets compression schemes based on video codecs. By leveraging a global geometric and photometric compensation with a locally linear prediction, significant improvements can be obtained. A second approach is then proposed which introduces a region-based inter-prediction scheme. The proposed method is able to improve the coding performances compared to existing solutions by estimating and compensating geometric and photometric distortions on a semi-local level. This approach is then adapted and validated in the context of video compression. Bit-rate improvements are obtained, especially for sequences displaying complex real-world motions such as zooms and rotations. The last part of the thesis focuses on deep learning approaches for inter-prediction. Deep neural networks have shown striking results for a large number of computer vision tasks over the last years. Deep learning based methods proposed for frame interpolation applications are studied here in the context of video compression. Coding performance improvements over traditional motion estimation and compensation methods highlight the potential of these deep architectures.En raison de la grande disponibilité des dispositifs de capture vidéo et des nouvelles pratiques liées aux réseaux sociaux, ainsi qu’à l’émergence desservices en ligne, les images et les vidéos constituent aujourd’hui une partie importante de données transmises sur internet. Les applications de streaming vidéo représentent ainsi plus de 70% de la bande passante totale de l’internet. Des milliards d’images sont déjà stockées dans le cloud et des millions y sont téléchargés chaque jour. Les besoins toujours croissants en streaming et stockage nécessitent donc une amélioration constante des outils de compression d’image et de vidéo. Cette thèse vise à explorer des nouvelles approches pour améliorer les méthodes actuelles de prédiction inter-images. De telles méthodes tirent parti des redondances entre images similaires, et ont été développées à l’origine dans le contexte de la vidéo compression. Dans une première partie, de nouveaux outils de prédiction inter globaux et locaux sont associés pour améliorer l’efficacité des schémas de compression de bases de données d’image. En associant une compensation géométrique et photométrique globale avec une prédiction linéaire locale, des améliorations significatives peuvent être obtenues. Une seconde approche est ensuite proposée qui introduit un schéma deprédiction inter par régions. La méthode proposée est en mesure d’améliorer les performances de codage par rapport aux solutions existantes en estimant et en compensant les distorsions géométriques et photométriques à une échelle semi locale. Cette approche est ensuite adaptée et validée dans le cadre de la compression vidéo. Des améliorations en réduction de débit sont obtenues, en particulier pour les séquences présentant des mouvements complexes réels tels que des zooms et des rotations. La dernière partie de la thèse se concentre sur l’étude des méthodes d’apprentissage en profondeur dans le cadre de la prédiction inter. Ces dernières années, les réseaux de neurones profonds ont obtenu des résultats impressionnants pour un grand nombre de tâches de vision par ordinateur. Les méthodes basées sur l’apprentissage en profondeur proposéesà l’origine pour de l’interpolation d’images sont étudiées ici dans le contexte de la compression vidéo. Des améliorations en terme de performances de codage sont obtenues par rapport aux méthodes d’estimation et de compensation de mouvements traditionnelles. Ces résultats mettent en évidence le fort potentiel de ces architectures profondes dans le domaine de la compression vidéo
    corecore