40 research outputs found

    High-Level Synthesis Based VLSI Architectures for Video Coding

    Get PDF
    High Efficiency Video Coding (HEVC) is state-of-the-art video coding standard. Emerging applications like free-viewpoint video, 360degree video, augmented reality, 3D movies etc. require standardized extensions of HEVC. The standardized extensions of HEVC include HEVC Scalable Video Coding (SHVC), HEVC Multiview Video Coding (MV-HEVC), MV-HEVC+ Depth (3D-HEVC) and HEVC Screen Content Coding. 3D-HEVC is used for applications like view synthesis generation, free-viewpoint video. Coding and transmission of depth maps in 3D-HEVC is used for the virtual view synthesis by the algorithms like Depth Image Based Rendering (DIBR). As first step, we performed the profiling of the 3D-HEVC standard. Computational intensive parts of the standard are identified for the efficient hardware implementation. One of the computational intensive part of the 3D-HEVC, HEVC and H.264/AVC is the Interpolation Filtering used for Fractional Motion Estimation (FME). The hardware implementation of the interpolation filtering is carried out using High-Level Synthesis (HLS) tools. Xilinx Vivado Design Suite is used for the HLS implementation of the interpolation filters of HEVC and H.264/AVC. The complexity of the digital systems is greatly increased. High-Level Synthesis is the methodology which offers great benefits such as late architectural or functional changes without time consuming in rewriting of RTL-code, algorithms can be tested and evaluated early in the design cycle and development of accurate models against which the final hardware can be verified

    Compression vidéo basée sur l'exploitation d'un décodeur intelligent

    Get PDF
    This Ph.D. thesis studies the novel concept of Smart Decoder (SDec) where the decoder is given the ability to simulate the encoder and is able to conduct the R-D competition similarly as in the encoder. The proposed technique aims to reduce the signaling of competing coding modes and parameters. The general SDec coding scheme and several practical applications are proposed, followed by a long-term approach exploiting machine learning concept in video coding. The SDec coding scheme exploits a complex decoder able to reproduce the choice of the encoder based on causal references, eliminating thus the need to signal coding modes and associated parameters. Several practical applications of the general outline of the SDec scheme are tested, using different coding modes during the competition on the reference blocs. Despite the choice for the SDec reference block being still simple and limited, interesting gains are observed. The long-term research presents an innovative method that further makes use of the processing capacity of the decoder. Machine learning techniques are exploited in video coding with the purpose of reducing the signaling overhead. Practical applications are given, using a classifier based on support vector machine to predict coding modes of a block. The block classification uses causal descriptors which consist of different types of histograms. Significant bit rate savings are obtained, which confirms the potential of the approach.Cette thèse de doctorat étudie le nouveau concept de décodeur intelligent (SDec) dans lequel le décodeur est doté de la possibilité de simuler l’encodeur et est capable de mener la compétition R-D de la même manière qu’au niveau de l’encodeur. Cette technique vise à réduire la signalisation des modes et des paramètres de codage en compétition. Le schéma général de codage SDec ainsi que plusieurs applications pratiques sont proposées, suivis d’une approche en amont qui exploite l’apprentissage automatique pour le codage vidéo. Le schéma de codage SDec exploite un décodeur complexe capable de reproduire le choix de l’encodeur calculé sur des blocs de référence causaux, éliminant ainsi la nécessité de signaler les modes de codage et les paramètres associés. Plusieurs applications pratiques du schéma SDec sont testées, en utilisant différents modes de codage lors de la compétition sur les blocs de référence. Malgré un choix encore simple et limité des blocs de référence, les gains intéressants sont observés. La recherche en amont présente une méthode innovante qui permet d’exploiter davantage la capacité de traitement d’un décodeur. Les techniques d’apprentissage automatique sont exploitées pour but de réduire la signalisation. Les applications pratiques sont données, utilisant un classificateur basé sur les machines à vecteurs de support pour prédire les modes de codage d’un bloc. La classification des blocs utilise des descripteurs causaux qui sont formés à partir de différents types d’histogrammes. Des gains significatifs en débit sont obtenus, confirmant ainsi le potentiel de l’approche

    3D coding tools final report

    Get PDF
    Livrable D4.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.3 du projet. Son titre : 3D coding tools final repor

    Analysis of affine motion-compensated prediction and its application in aerial video coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom) contained in aerial video sequences such as captured from unmanned aerial vehicles, an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived by extending a model of purely translational motion-compensated prediction. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdf of the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. It is of particular interest since such a model is considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. As the simplified affine motion model is able to describe most motions contained in aerial surveillance videos, its application in video coding is justified. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. Although the bit rate in motion-compensated prediction can be considerably reduced by using a motion model which is able to describe motion types occurring in the scene, the total video bit rate may remain quite high, depending on the motion estimation accuracy. Thus, at the example of aerial surveillance sequences, a codec independent region of interest- ( ROI -) based aerial video coding system is proposed that exploits the characteristic of such sequences. Assuming the captured scene to be planar, one frame can be projected into another using global motion compensation. Consequently, only new emerging areas have to be encoded. At the decoder, all new areas are registered into a so-called mosaic. From this, reconstructed frames are extracted and concatenated as a video sequence. To also preserve moving objects in the reconstructed video, local motion is detected and encoded in addition to the new areas. The proposed general ROI coding system was evaluated for very low and low bit rates between 100 and 5000 kbit/s for aerial sequences of HD resolution. It is able to reduce the bit rate by 90% compared to common HEVC coding of similar quality. Subjective tests confirm that the overall image quality of the ROI coding system exceeds that of a common HEVC encoder especially at very low bit rates below 1 Mbit/s. To prevent discontinuities introduced by inaccurate global motion estimation, as may be caused by radial lens distortion, a fully automatic in-loop radial distortion compensation is proposed. For this purpose, an unknown radial distortion compensation parameter that is constant for a group of frames is jointly estimated with the global motion. This parameter is optimized to minimize the distortions of the projections of frames in the mosaic. By this approach, the global motion compensation was improved by 0.27dB and discontinuities in the frames extracted from the mosaic are diminished. As an additional benefit, the generation of long-term mosaics becomes possible, constructed by more than 1500 aerial frames with unknown radial lens distortion and without any calibration or manual lens distortion compensation.Bewegungskompensierte Prädiktion wird in Videocodierstandards wie High Efficiency Video Coding (HEVC) als ein Schlüsselelement zur Datenkompression verwendet. Typischerweise kommt dabei ein rein translatorisches Bewegungsmodell zum Einsatz. Um auch nicht-translatorische Bewegungen wie Rotation oder Skalierung (Zoom) beschreiben zu können, welche beispielsweise in von unbemannten Luftfahrzeugen aufgezeichneten Luftbildvideosequenzen enthalten sind, kann ein affines Bewegungsmodell verwendet werden. In dieser Arbeit wird aufbauend auf einem rein translatorischen Bewegungsmodell ein Modell für affine bewegungskompensierte Prädiktion hergeleitet. Unter Verwendung der Raten-Verzerrungs-Theorie und des Verschiebungsschätzfehlers, welcher aus einer inexakten affinen Bewegungsschätzung resultiert, wird die minimal erforderliche Bitrate zur Codierung des Prädiktionsfehlers hergeleitet. Für die Modellierung wird angenommen, dass die sechs Parameter einer affinen Transformation durch statistisch unabhängige Schätzfehler gestört sind. Für jeden dieser Schätzfehler wird angenommen, dass die Wahrscheinlichkeitsdichteverteilung einer mittelwertfreien Gaußverteilung entspricht. Aus der Verbundwahrscheinlichkeitsdichte der Schätzfehler wird die Wahrscheinlichkeitsdichte des ortsabhängigen Verschiebungsschätzfehlers im Bild berechnet. Letztere wird schließlich zu der minimalen Bitrate in Beziehung gesetzt, welche für die Codierung des Prädiktionsfehlers benötigt wird. Analog zur obigen Ableitung des Modells für das voll-affine Bewegungsmodell wird ein vereinfachtes affines Bewegungsmodell mit vier Freiheitsgraden untersucht. Ein solches Modell wird derzeit auch im Rahmen der Standardisierung des HEVC-Nachfolgestandards Versatile Video Coding (VVC) evaluiert. Da das vereinfachte Modell bereits die meisten in Luftbildvideosequenzen vorkommenden Bewegungen abbilden kann, ist der Einsatz des vereinfachten affinen Modells in der Videocodierung gerechtfertigt. Beide Modelle liefern wertvolle Informationen über die minimal benötigte Bitrate zur Codierung des Prädiktionsfehlers in Abhängigkeit von der affinen Schätzgenauigkeit. Zwar kann die Bitrate mittels bewegungskompensierter Prädiktion durch Wahl eines geeigneten Bewegungsmodells und akkurater affiner Bewegungsschätzung stark reduziert werden, die verbleibende Gesamtbitrate kann allerdings dennoch relativ hoch sein. Deshalb wird am Beispiel von Luftbildvideosequenzen ein Regionen-von-Interesse- (ROI-) basiertes Codiersystem vorgeschlagen, welches spezielle Eigenschaften solcher Sequenzen ausnutzt. Unter der Annahme, dass eine aufgenommene Szene planar ist, kann ein Bild durch globale Bewegungskompensation in ein anderes projiziert werden. Deshalb müssen vom aktuellen Bild prinzipiell nur noch neu im Bild erscheinende Bereiche codiert werden. Am Decoder werden alle neuen Bildbereiche in einem gemeinsamen Mosaikbild registriert, aus dem schließlich die Einzelbilder der Videosequenz rekonstruiert werden können. Um auch lokale Bewegungen abzubilden, werden bewegte Objekte detektiert und zusätzlich zu neuen Bildbereichen als ROI codiert. Die Leistungsfähigkeit des ROI-Codiersystems wurde insbesondere für sehr niedrige und niedrige Bitraten von 100 bis 5000 kbit/s für Bilder in HD-Auflösung evaluiert. Im Vergleich zu einer gewöhnlichen HEVC-Codierung kann die Bitrate um 90% reduziert werden. Durch subjektive Tests wurde bestätigt, dass das ROI-Codiersystem insbesondere für sehr niedrige Bitraten von unter 1 Mbit/s deutlich leistungsfähiger in Bezug auf Detailauflösung und Gesamteindruck ist als ein herkömmliches HEVC-Referenzsystem. Um Diskontinuitäten in den rekonstruierten Videobildern zu vermeiden, die durch eine durch Linsenverzeichnungen induzierte ungenaue globale Bewegungsschätzung entstehen können, wird eine automatische Radialverzeichnungskorrektur vorgeschlagen. Dabei wird ein unbekannter, jedoch über mehrere Bilder konstanter Korrekturparameter gemeinsam mit der globalen Bewegung geschätzt. Dieser Parameter wird derart optimiert, dass die Projektionen der Bilder in das Mosaik möglichst wenig verzerrt werden. Daraus resultiert eine um 0,27dB verbesserte globale Bewegungskompensation, wodurch weniger Diskontinuitäten in den aus dem Mosaik rekonstruierten Bildern entstehen. Dieses Verfahren ermöglicht zusätzlich die Erstellung von Langzeitmosaiken aus über 1500 Luftbildern mit unbekannter Radialverzeichnung und ohne manuelle Korrektur

    Video processing for panoramic streaming using HEVC and its scalable extensions

    Get PDF
    Panoramic streaming is a particular way of video streaming where an arbitrary Region-of-Interest (RoI) is transmitted from a high-spatial resolution video, i.e. a video covering a very “wide-angle” (much larger than the human field-of-view – e.g. 360°). Some transport schemes for panoramic video delivery have been proposed and demonstrated within the past decade, which allow users to navigate interactively within the high-resolution videos. With the recent advances of head mounted displays, consumers may soon have immersive and sufficiently convenient end devices at reach, which could lead to an increasing demand for panoramic video experiences. The solution proposed within this paper is built upon tile-based panoramic streaming, where users receive a set of tiles that match their RoI, and consists in a low-complexity compressed domain video processing technique for using H.265/HEVC and its scalable extensions (H.265/SHVC and H.265/MV-HEVC). The proposed technique generates a single video bitstream out of the selected tiles so that a single hardware decoder can be used. It overcomes the scalability issue of previous solutions not using tiles and the battery consumption issue inherent of tile-based panorama streaming, where multiple parallel software decoders are used. In addition, the described technique is capable of reducing peak streaming bitrate during changes of the RoI, which is crucial for allowing a truly immersive and low latency video experience. Besides, it makes it possible to use Open GOP structures without incurring any playback interruption at switching events, which provides a better compression efficiency compared to closed GOP structures

    Análise do HEVC escalável : desempenho e controlo de débito

    Get PDF
    Mestrado em Engenharia Eletrónica e TelecomunicaçõesEsta dissertação apresenta um estudo da norma de codificação de vídeo de alta eficiência (HEVC) e a sua extensão para vídeo escalável, SHVC. A norma de vídeo SHVC proporciona um melhor desempenho quando codifica várias camadas em simultâneo do que quando se usa o codificador HEVC numa configuração simulcast. Ambos os codificadores de referência, tanto para a camada base como para a camada superior usam o mesmo modelo de controlo de débito, modelo R-λ, que foi otimizado para o HEVC. Nenhuma otimização de alocação de débito entre camadas foi até ao momento proposto para o modelo de testes (SHM 8) para a escalabilidade do HEVC (SHVC). Derivamos um novo modelo R-λ apropriado para a camada superior e para o caso de escalabilidade espacial, que conduziu a um ganho de BD-débito de 1,81% e de BD-PSNR de 0,025 em relação ao modelo de débito-distorção existente no SHM do SHVC. Todavia, mostrou-se também nesta dissertação que o proposto modelo de R-λ não deve ser usado na camada inferior (camada base) no SHVC e por conseguinte no HEVC.This dissertation provides a study of the High Efficiency Video Coding standard (HEVC) and its scalable extension, SHVC. The SHVC provides a better performance when encoding several layers simultaneously than using an HEVC encoder in a simulcast configuration. Both reference encoders, in the base layer and in the enhancement layer use the same rate control model, R-λ model, which was optimized for HEVC. No optimal bitrate partitioning amongst layers is proposed in scalable HEVC (SHVC) test model (SHM 8). We derived a new R-λ model for the enhancement layer and for the spatial case which led to a DB-rate gain of 1.81% and DB-PSNR gain of 0.025 in relation to the rate-distortion model of SHM-SHVC. Nevertheless, we also show in this dissertation that the proposed model of R-λ should not be used neither in the base layer nor in HEVC

    Analysis of Affine Motion-Compensated Prediction in Video Coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom), e. g. contained in aerial video sequences such as captured from unmanned aerial vehicles (UAV), an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdfof the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. Both models are of particular interest since they are considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. © 1992-2012 IEEE

    Depth-based Multi-View 3D Video Coding

    Get PDF
    corecore