9 research outputs found

    Analysis of Affine Motion-Compensated Prediction in Video Coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom), e. g. contained in aerial video sequences such as captured from unmanned aerial vehicles (UAV), an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdfof the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. Both models are of particular interest since they are considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. © 1992-2012 IEEE

    Low bit-rate image sequence coding

    Get PDF

    Analysis of affine motion-compensated prediction and its application in aerial video coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom) contained in aerial video sequences such as captured from unmanned aerial vehicles, an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived by extending a model of purely translational motion-compensated prediction. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdf of the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. It is of particular interest since such a model is considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. As the simplified affine motion model is able to describe most motions contained in aerial surveillance videos, its application in video coding is justified. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. Although the bit rate in motion-compensated prediction can be considerably reduced by using a motion model which is able to describe motion types occurring in the scene, the total video bit rate may remain quite high, depending on the motion estimation accuracy. Thus, at the example of aerial surveillance sequences, a codec independent region of interest- ( ROI -) based aerial video coding system is proposed that exploits the characteristic of such sequences. Assuming the captured scene to be planar, one frame can be projected into another using global motion compensation. Consequently, only new emerging areas have to be encoded. At the decoder, all new areas are registered into a so-called mosaic. From this, reconstructed frames are extracted and concatenated as a video sequence. To also preserve moving objects in the reconstructed video, local motion is detected and encoded in addition to the new areas. The proposed general ROI coding system was evaluated for very low and low bit rates between 100 and 5000 kbit/s for aerial sequences of HD resolution. It is able to reduce the bit rate by 90% compared to common HEVC coding of similar quality. Subjective tests confirm that the overall image quality of the ROI coding system exceeds that of a common HEVC encoder especially at very low bit rates below 1 Mbit/s. To prevent discontinuities introduced by inaccurate global motion estimation, as may be caused by radial lens distortion, a fully automatic in-loop radial distortion compensation is proposed. For this purpose, an unknown radial distortion compensation parameter that is constant for a group of frames is jointly estimated with the global motion. This parameter is optimized to minimize the distortions of the projections of frames in the mosaic. By this approach, the global motion compensation was improved by 0.27dB and discontinuities in the frames extracted from the mosaic are diminished. As an additional benefit, the generation of long-term mosaics becomes possible, constructed by more than 1500 aerial frames with unknown radial lens distortion and without any calibration or manual lens distortion compensation.Bewegungskompensierte Prädiktion wird in Videocodierstandards wie High Efficiency Video Coding (HEVC) als ein Schlüsselelement zur Datenkompression verwendet. Typischerweise kommt dabei ein rein translatorisches Bewegungsmodell zum Einsatz. Um auch nicht-translatorische Bewegungen wie Rotation oder Skalierung (Zoom) beschreiben zu können, welche beispielsweise in von unbemannten Luftfahrzeugen aufgezeichneten Luftbildvideosequenzen enthalten sind, kann ein affines Bewegungsmodell verwendet werden. In dieser Arbeit wird aufbauend auf einem rein translatorischen Bewegungsmodell ein Modell für affine bewegungskompensierte Prädiktion hergeleitet. Unter Verwendung der Raten-Verzerrungs-Theorie und des Verschiebungsschätzfehlers, welcher aus einer inexakten affinen Bewegungsschätzung resultiert, wird die minimal erforderliche Bitrate zur Codierung des Prädiktionsfehlers hergeleitet. Für die Modellierung wird angenommen, dass die sechs Parameter einer affinen Transformation durch statistisch unabhängige Schätzfehler gestört sind. Für jeden dieser Schätzfehler wird angenommen, dass die Wahrscheinlichkeitsdichteverteilung einer mittelwertfreien Gaußverteilung entspricht. Aus der Verbundwahrscheinlichkeitsdichte der Schätzfehler wird die Wahrscheinlichkeitsdichte des ortsabhängigen Verschiebungsschätzfehlers im Bild berechnet. Letztere wird schließlich zu der minimalen Bitrate in Beziehung gesetzt, welche für die Codierung des Prädiktionsfehlers benötigt wird. Analog zur obigen Ableitung des Modells für das voll-affine Bewegungsmodell wird ein vereinfachtes affines Bewegungsmodell mit vier Freiheitsgraden untersucht. Ein solches Modell wird derzeit auch im Rahmen der Standardisierung des HEVC-Nachfolgestandards Versatile Video Coding (VVC) evaluiert. Da das vereinfachte Modell bereits die meisten in Luftbildvideosequenzen vorkommenden Bewegungen abbilden kann, ist der Einsatz des vereinfachten affinen Modells in der Videocodierung gerechtfertigt. Beide Modelle liefern wertvolle Informationen über die minimal benötigte Bitrate zur Codierung des Prädiktionsfehlers in Abhängigkeit von der affinen Schätzgenauigkeit. Zwar kann die Bitrate mittels bewegungskompensierter Prädiktion durch Wahl eines geeigneten Bewegungsmodells und akkurater affiner Bewegungsschätzung stark reduziert werden, die verbleibende Gesamtbitrate kann allerdings dennoch relativ hoch sein. Deshalb wird am Beispiel von Luftbildvideosequenzen ein Regionen-von-Interesse- (ROI-) basiertes Codiersystem vorgeschlagen, welches spezielle Eigenschaften solcher Sequenzen ausnutzt. Unter der Annahme, dass eine aufgenommene Szene planar ist, kann ein Bild durch globale Bewegungskompensation in ein anderes projiziert werden. Deshalb müssen vom aktuellen Bild prinzipiell nur noch neu im Bild erscheinende Bereiche codiert werden. Am Decoder werden alle neuen Bildbereiche in einem gemeinsamen Mosaikbild registriert, aus dem schließlich die Einzelbilder der Videosequenz rekonstruiert werden können. Um auch lokale Bewegungen abzubilden, werden bewegte Objekte detektiert und zusätzlich zu neuen Bildbereichen als ROI codiert. Die Leistungsfähigkeit des ROI-Codiersystems wurde insbesondere für sehr niedrige und niedrige Bitraten von 100 bis 5000 kbit/s für Bilder in HD-Auflösung evaluiert. Im Vergleich zu einer gewöhnlichen HEVC-Codierung kann die Bitrate um 90% reduziert werden. Durch subjektive Tests wurde bestätigt, dass das ROI-Codiersystem insbesondere für sehr niedrige Bitraten von unter 1 Mbit/s deutlich leistungsfähiger in Bezug auf Detailauflösung und Gesamteindruck ist als ein herkömmliches HEVC-Referenzsystem. Um Diskontinuitäten in den rekonstruierten Videobildern zu vermeiden, die durch eine durch Linsenverzeichnungen induzierte ungenaue globale Bewegungsschätzung entstehen können, wird eine automatische Radialverzeichnungskorrektur vorgeschlagen. Dabei wird ein unbekannter, jedoch über mehrere Bilder konstanter Korrekturparameter gemeinsam mit der globalen Bewegung geschätzt. Dieser Parameter wird derart optimiert, dass die Projektionen der Bilder in das Mosaik möglichst wenig verzerrt werden. Daraus resultiert eine um 0,27dB verbesserte globale Bewegungskompensation, wodurch weniger Diskontinuitäten in den aus dem Mosaik rekonstruierten Bildern entstehen. Dieses Verfahren ermöglicht zusätzlich die Erstellung von Langzeitmosaiken aus über 1500 Luftbildern mit unbekannter Radialverzeichnung und ohne manuelle Korrektur

    Motion compensated interpolation for subband coding of moving images

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (leaves 108-119).by Mark Daniel Polomski.M.S

    Study of communications data compression methods

    Get PDF
    A simple monochrome conditional replenishment system was extended to higher compression and to higher motion levels, by incorporating spatially adaptive quantizers and field repeating. Conditional replenishment combines intraframe and interframe compression, and both areas are investigated. The gain of conditional replenishment depends on the fraction of the image changing, since only changed parts of the image need to be transmitted. If the transmission rate is set so that only one fourth of the image can be transmitted in each field, greater change fractions will overload the system. A computer simulation was prepared which incorporated (1) field repeat of changes, (2) a variable change threshold, (3) frame repeat for high change, and (4) two mode, variable rate Hadamard intraframe quantizers. The field repeat gives 2:1 compression in moving areas without noticeable degradation. Variable change threshold allows some flexibility in dealing with varying change rates, but the threshold variation must be limited for acceptable performance

    Analog parallel processor solutions for video encoding

    Get PDF
    This thesis deals with Cellular Nonlinear Network (CNN) analog parallel processor networks and their implementations in current video coding standards. The target applications are low-power video encoders within 3rd generation mobile terminals. The video codecs of such mobile terminals are defined by either the MPEG-4/H.263 or H.264 video standard. All of these standards are based on the block-based hybrid approach. As block-based motion estimation (ME) is responsible for most of the power consumption of such hybrid video encoders, this thesis deals mostly with low-power ME implementations. Low-power solutions are introduced at both the algorithmic and hardware levels. On the algorithmic level, the introduced implementations are derived from a segmentation algorithm, which has previously been partly realized. The first introduced algorithm reduces the computational complexity of ME within an object-based MPEG-4 encoder. The use of this algorithm enables a 60% drop in the power consumption of Full Search ME. The second algorithm calculates a near-optimal block-size partition for H.264 motion estimation. With this algorithm, the use of computationally complex Lagrange optimization in H.264 ME is not required. The third algorithm reduces the shape bit-rate of an object-based MPEG-4 encoder. On the hardware level a CNN-type ME architecture is introduced. The architecture includes connections and circuitry to fully realize block-based ME. The analog ME implemented with this architecture is capable of lower power than comparable digital realizations. A 9×9 test chip has also been realized. Additionally implemented is a digital predictive ME realization that takes advantage of the introduced partition algorithm. Although the IC layout of the ME algorithm was drawn, the design was verified as an FPGA.reviewe

    Rate-distortion analysis and traffic modeling of scalable video coders

    Get PDF
    In this work, we focus on two important goals of the transmission of scalable video over the Internet. The first goal is to provide high quality video to end users and the second one is to properly design networks and predict network performance for video transmission based on the characteristics of existing video traffic. Rate-distortion (R-D) based schemes are often applied to improve and stabilize video quality; however, the lack of R-D modeling of scalable coders limits their applications in scalable streaming. Thus, in the first part of this work, we analyze R-D curves of scalable video coders and propose a novel operational R-D model. We evaluate and demonstrate the accuracy of our R-D function in various scalable coders, such as Fine Granular Scalable (FGS) and Progressive FGS coders. Furthermore, due to the time-constraint nature of Internet streaming, we propose another operational R-D model, which is accurate yet with low computational cost, and apply it to streaming applications for quality control purposes. The Internet is a changing environment; however, most quality control approaches only consider constant bit rate (CBR) channels and no specific studies have been conducted for quality control in variable bit rate (VBR) channels. To fill this void, we examine an asymptotically stable congestion control mechanism and combine it with our R-D model to present smooth visual quality to end users under various network conditions. Our second focus in this work concerns the modeling and analysis of video traffic, which is crucial to protocol design and efficient network utilization for video transmission. Although scalable video traffic is expected to be an important source for the Internet, we find that little work has been done on analyzing or modeling it. In this regard, we develop a frame-level hybrid framework for modeling multi-layer VBR video traffic. In the proposed framework, the base layer is modeled using a combination of wavelet and time-domain methods and the enhancement layer is linearly predicted from the base layer using the cross-layer correlation

    Image data compression based on a multiresolution signal model

    Get PDF
    Image data compression is an important topic within the general field of image processing. It has practical applications varying from medical imagery to video telephones, and provides significant implications for image modelling theory. In this thesis a new class of linear signal models, linear interpolative multiresolution models, is presented and applied to the data compression of a range of natural images. The key property of these models is that whilst they are non- causal in the two spatial dimensions they are causal in a third dimension, the scale dimension. This leads to computationally efficient predictors which form the basis of the data compression algorithms. Models of varying complexity are presented, ranging from a simple stationary form to one which models visually important features such as lines and edges in terms of scale and orientation. In addition to theoretical results such as related rate distortion functions, the results of applying the compression algorithms to a variety of images are presented. These results compare favourably, particularly at high compression ratios, with many of the techniques described in the literature, both in terms of mean squared quantisation noise and more meaningfully, in terms of perceived visual quality. In particular the use of local orientation over various scales within the consistent spatial interpolative framework of the model significantly reduces perceptually important distortions such as the blocking artefacts often seen with high compression coders. A new algorithm for fast computation of the orientation information required by the adaptive coder is presented which results in an overall computational complexity for the coder which is broadly comparable to that of the simpler non-adaptive coder. This thesis is concluded with a discussion of some of the important issues raised by the work
    corecore