1,917 research outputs found

    Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

    Get PDF
    The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.Comment: IEEE Open Journal of Signal Processing Special Issue on Applied AI and Machine Learning for Video Coding and Streaming, June 202

    Segmentation-based mesh design for motion estimation

    Get PDF
    Dans la plupart des codec vidéo standard, l'estimation des mouvements entre deux images se fait généralement par l'algorithme de concordance des blocs ou encore BMA pour « Block Matching Algorithm ». BMA permet de représenter l'évolution du contenu des images en décomposant normalement une image par blocs 2D en mouvement translationnel. Cette technique de prédiction conduit habituellement à de sévères distorsions de 1'artefact de bloc lorsque Ie mouvement est important. De plus, la décomposition systématique en blocs réguliers ne dent pas compte nullement du contenu de l'image. Certains paramètres associes aux blocs, mais inutiles, doivent être transmis; ce qui résulte d'une augmentation de débit de transmission. Pour paillier a ces défauts de BMA, on considère les deux objectifs importants dans Ie codage vidéo, qui sont de recevoir une bonne qualité d'une part et de réduire la transmission a très bas débit d'autre part. Dans Ie but de combiner les deux exigences quasi contradictoires, il est nécessaire d'utiliser une technique de compensation de mouvement qui donne, comme transformation, de bonnes caractéristiques subjectives et requiert uniquement, pour la transmission, l'information de mouvement. Ce mémoire propose une technique de compensation de mouvement en concevant des mailles 2D triangulaires a partir d'une segmentation de l'image. La décomposition des mailles est construite a partir des nœuds repartis irrégulièrement Ie long des contours dans l'image. La décomposition résultant est ainsi basée sur Ie contenu de l'image. De plus, étant donné la même méthode de sélection des nœuds appliquée à l'encodage et au décodage, la seule information requise est leurs vecteurs de mouvement et un très bas débit de transmission peut ainsi être réalise. Notre approche, comparée avec BMA, améliore à la fois la qualité subjective et objective avec beaucoup moins d'informations de mouvement. Dans la premier chapitre, une introduction au projet sera présentée. Dans Ie deuxième chapitre, on analysera quelques techniques de compression dans les codec standard et, surtout, la populaire BMA et ses défauts. Dans Ie troisième chapitre, notre algorithme propose et appelé la conception active des mailles a base de segmentation, sera discute en détail. Ensuite, les estimation et compensation de mouvement seront décrites dans Ie chapitre 4. Finalement, au chapitre 5, les résultats de simulation et la conclusion seront présentés.Abstract: In most video compression standards today, the generally accepted method for temporal prediction is motion compensation using block matching algorithm (BMA). BMA represents the scene content evolution with 2-D rigid translational moving blocks. This kind of predictive scheme usually leads to distortions such as block artefacts especially when the motion is important. The two most important aims in video coding are to receive a good quality on one hand and a low bit-rate on the other. This thesis proposes a motion compensation scheme using segmentation-based 2-D triangular mesh design method. The mesh is constructed by irregularly spread nodal points selected along image contour. Based on this, the generated mesh is, to a great extent, image content based. Moreover, the nodes are selected with the same method on the encoder and decoder sides, so that the only information that has to be transmitted are their motion vectors, and thus very low bit-rate can be achieved. Compared with BMA, our approach could improve subjective and objective quality with much less motion information."--Résumé abrégé par UM

    Multi-standard reconfigurable motion estimation processor for hybrid video codecs

    Get PDF

    H.264 Motion Estimation and Applications

    Get PDF

    A Research on Enhancing Reconstructed Frames in Video Codecs

    Get PDF
    A series of video codecs, combining encoder and decoder, have been developed to improve the human experience of video-on-demand: higher quality videos at lower bitrates. Despite being at the leading of the compression race, the High Efficiency Video Coding (HEVC or H.265), the latest Versatile Video Coding (VVC) standard, and compressive sensing (CS) are still suffering from lossy compression. Lossy compression algorithms approximate input signals by smaller file size but degrade reconstructed data, leaving space for further improvement. This work aims to develop hybrid codecs taking advantage of both state-of-the-art video coding technologies and deep learning techniques: traditional non-learning components will either be replaced or combined with various deep learning models. Note that related studies have not made the most of coding information, this work studies and utilizes more potential resources in both encoder and decoder for further improving different codecs.In the encoder, motion compensated prediction (MCP) is one of the key components that bring high compression ratios to video codecs. For enhancing the MCP performance, modern video codecs offer interpolation filters for fractional motions. However, these handcrafted fractional interpolation filters are designed on ideal signals, which limit the codecs in dealing with real-world video data. This proposal introduces a deep learning approach for all Luma and Chroma fractional pixels, aiming for more accurate motion compensation and coding efficiency.One extraordinary feature of CS compared to other codecs is that CS can recover multiple images at the decoder by applying various algorithms on the one and only coded data. Note that the related works have not made use of this property, this work enables a deep learning-based compressive sensing image enhancement framework using multiple reconstructed signals. Learning to enhance from multiple reconstructed images delivers a valuable mechanism for training deep neural networks while requiring no additional transmitted data.In the encoder and decoder of modern video coding standards, in-loop filters (ILF) dedicate the most important role in producing the final reconstructed image quality and compression rate. This work introduces a deep learning approach for improving the handcrafted ILF for modern video coding standards. We first utilize various coding resources and present novel deep learning-based ILF. Related works perform the rate-distortion-based ILF mode selection at the coding-tree-unit (CTU) level to further enhance the deep learning-based ILF, and the corresponding bits are encoded and transmitted to the decoder. In this work, we move towards a deeper approach: a reinforcement-learning based autonomous ILF mode selection scheme is presented, enabling the ability to adapt to different coding unit (CU) levels. Using this approach, we require no additional bits while ensuring the best image quality at local levels beyond the CTU level.While this research mainly targets improving the recent video coding standard VVC and the sparse-based CS, it is also flexibly designed to adapt the previous and future video coding standards with minor modifications.博士(工学)法政大学 (Hosei University

    High-Level Synthesis Based VLSI Architectures for Video Coding

    Get PDF
    High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard. Emerging applications like free-viewpoint video, 360degree video, augmented reality, 3D movies etc. require standardized extensions of HEVC. The standardized extensions of HEVC include HEVC Scalable Video Coding (SHVC), HEVC Multiview Video Coding (MV-HEVC), MV-HEVC+ Depth (3D-HEVC) and HEVC Screen Content Coding. 3D-HEVC is used for applications like view synthesis generation, free-viewpoint video. Coding and transmission of depth maps in 3D-HEVC is used for the virtual view synthesis by the algorithms like Depth Image Based Rendering (DIBR). As first step, we performed the profiling of the 3D-HEVC standard. Computational intensive parts of the standard are identified for the efficient hardware implementation. One of the computational intensive part of the 3D-HEVC, HEVC and H.264/AVC is the Interpolation Filtering used for Fractional Motion Estimation (FME). The hardware implementation of the interpolation filtering is carried out using High-Level Synthesis (HLS) tools. Xilinx Vivado Design Suite is used for the HLS implementation of the interpolation filters of HEVC and H.264/AVC. The complexity of the digital systems is greatly increased. High-Level Synthesis is the methodology which offers great benefits such as late architectural or functional changes without time consuming in rewriting of RTL-code, algorithms can be tested and evaluated early in the design cycle and development of accurate models against which the final hardware can be verified

    Improvement and optimization of H.264 video codec.

    Get PDF
    Tang, Kai Lam.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references.Abstracts in English and Chinese.Acknowledgement --- p.iAbstract --- p.iiContents --- p.ivPublication List --- p.viiChapter Chapter 1 --- Introduction --- p.1-1Chapter 1.1 --- Video Coding --- p.1-1Chapter 1.1.1 --- Temporal prediction --- p.1-5Chapter 1.1.2 --- Transform Coding --- p.1-9Chapter 1.1.3 --- Quantization --- p.1-12Chapter 1.1.4 --- Entropy Coding --- p.1-14Chapter 1.2 --- H.264/MPEG-4 Part 10 --- p.1-15Chapter 1.2.1 --- Overview --- p.1-16Chapter 1.2.2 --- Intra Prediction --- p.1-19Chapter 1.2.3 --- Inter Prediction --- p.1-20Chapter 1.2.4 --- Transform and Quantization --- p.1-23Chapter 1.2.5 --- Entropy Coding --- p.1-25Chapter 1.2.6 --- Deblocking Filter --- p.1-29Chapter 1.3 --- Organization of the Thesis --- p.1-32Chapter 1.3.1 --- Review of Motion Estimation Techniques --- p.1-32Chapter 1.3.2 --- The Proposed Algorithms --- p.1-33Chapter 1.3.3 --- Optimization of the Codec --- p.1-34Chapter 1.4 --- Contributions --- p.1-35Chapter Chapter 2 --- Review of Motion Estimation Techniques --- p.2-1Chapter 2.1 --- Fast Full Search --- p.2-2Chapter 2.2 --- Hybrid Unsymmetrical-cross Multi-Hexagon-grid Search --- p.2-4Chapter 2.3 --- Center biased Fractional Pel Search --- p.2-6Chapter 2.4 --- Enhanced Predictive Zonal Search --- p.2-7Chapter Chapter 3 --- Enhancement Techniques for Intra Block Matching --- p.3-1Chapter 3.1 --- Introduction --- p.3-1Chapter 3.1.1 --- Fundamental Principles --- p.3-1Chapter 3.1.2 --- Variable Block Size Intra Block Matching --- p.3-3Chapter 3.2 --- Proposed Techniques --- p.3-5Chapter 3.2.1 --- Padding --- p.3-5Chapter 3.2.2 --- Modes --- p.3-9Chapter 3.2.3 --- Performance Enhancement Tools --- p.3-12Chapter 3.2.3.1 --- Multiple Best Matches --- p.3-12Chapter 3.2.3.2 --- Adaptive Integer and Sub-pixel Intra Block Matching --- p.3-13Chapter 3.2.4 --- Pseudo Intra Block Matching --- p.3-14Chapter 3.3 --- Proposed Fast Algorithms --- p.3-16Chapter 3.3.1 --- Fast Intra Block Matching Decision --- p.3-16Chapter 3.3.2 --- Skipping some Intra Block Matching Processes --- p.3-18Chapter 3.3.3 --- Early Termination --- p.3-19Chapter 3.3.4 --- SAD Reuse Techniques --- p.3-21Chapter 3.4 --- Experimental Results --- p.3-22Chapter Chapter 4 --- Enhanced SAD Reuse Fast Motion Estimation --- p.4-1Chapter 4.1 --- Introduction --- p.4-1Chapter 4.2 --- Proposed Fast Motion Estimation Algorithm --- p.4-3Chapter 4.2.1 --- Best Initial Motion Vector --- p.4-3Chapter 4.2.2 --- Initial Search Pattern --- p.4-4Chapter 4.2.3 --- Initial Search Process and Search Pattern Improvement Process --- p.4-7Chapter 4.2.3.1 --- BISPCSP Motion Estimation or Refinement Process Decision --- p.4-8Chapter 4.2.3.2 --- ISP Motion Estimation or Refinement Process Decision --- p.4-9Chapter 4.2.4 --- Motion Estimation Process and Refinement Process --- p.4-9Chapter 4.2.4.1 --- Motion Estimation Process --- p.4-9Chapter 4.2.4.2 --- Refinement Process --- p.4-11Chapter 4.2.5 --- Motion Estimation Skip Process for B Pictures --- p.4-12Chapter 4.3 --- Experimental Results --- p.4-13Chapter Chapter 5 --- Development of Real-Time H.264 Codec on Pocket PC --- p.5-1Chapter 5.1 --- Algorithmic Optimizations --- p.5-2Chapter 5.1.1 --- Fast Sub-Pixel Motion Estimation --- p.5-2Chapter 5.1.2 --- Interpolation --- p.5-5Chapter 5.1.2.1 --- Revision of Luma Interpolation --- p.5-5Chapter 5.1.2.2 --- Fast Interpolation --- p.5-8Chapter 5.1.3 --- Skipping Inverse ICT and Inverse Quantization Depends on Coded Block Pattern --- p.5-10Chapter 5. 2 --- Code Level Optimizations --- p.5-12Chapter 5.2.1 --- Merging Loops --- p.5-12Chapter 5.2.2 --- Moving Independent Code outside the Loop --- p.5-13Chapter 5.2.3 --- Unrolling Loops --- p.5-14Chapter 5.3 --- Experimental Results --- p.5-16Chapter 5.4 --- Applications --- p.5-26Chapter Chapter 6 --- Conclusions and Future Development --- p.6-1Chapter 6.1 --- Conclusions --- p.6-1Chapter 6.1.1 --- Enhancement Techniques for Intra Block Matching --- p.6-1Chapter 6.1.2 --- Enhanced SAD Reuse Fast Motion Estimation --- p.6-1Chapter 6.1.3 --- Development of Real-Time H.264 Codec on Pocket PC --- p.6-2Chapter 6.2 --- Future Development --- p.6-3Bibliography --- p.

    State-of-the-Art and Trends in Scalable Video Compression with Wavelet Based Approaches

    Get PDF
    3noScalable Video Coding (SVC) differs form traditional single point approaches mainly because it allows to encode in a unique bit stream several working points corresponding to different quality, picture size and frame rate. This work describes the current state-of-the-art in SVC, focusing on wavelet based motion-compensated approaches (WSVC). It reviews individual components that have been designed to address the problem over the years and how such components are typically combined to achieve meaningful WSVC architectures. Coding schemes which mainly differ from the space-time order in which the wavelet transforms operate are here compared, discussing strengths and weaknesses of the resulting implementations. An evaluation of the achievable coding performances is provided considering the reference architectures studied and developed by ISO/MPEG in its exploration on WSVC. The paper also attempts to draw a list of major differences between wavelet based solutions and the SVC standard jointly targeted by ITU and ISO/MPEG. A major emphasis is devoted to a promising WSVC solution, named STP-tool, which presents architectural similarities with respect to the SVC standard. The paper ends drawing some evolution trends for WSVC systems and giving insights on video coding applications which could benefit by a wavelet based approach.partially_openpartially_openADAMI N; SIGNORONI. A; R. LEONARDIAdami, Nicola; Signoroni, Alberto; Leonardi, Riccard
    corecore