13 research outputs found

    A comprehensive video codec comparison

    Get PDF
    In this paper, we compare the video codecs AV1 (version 1.0.0-2242 from August 2019), HEVC (HM and x265), AVC (x264), the exploration software JEM which is based on HEVC, and the VVC (successor of HEVC) test model VTM (version 4.0 from February 2019) under two fair and balanced configurations: All Intra for the assessment of intra coding and Maximum Coding Efficiency with all codecs being tuned for their best coding efficiency settings. VTM achieves the highest coding efficiency in both configurations, followed by JEM and AV1. The worst coding efficiency is achieved by x264 and x265, even in the placebo preset for highest coding efficiency. AV1 gained a lot in terms of coding efficiency compared to previous versions and now outperforms HM by 24% BD-Rate gains. VTM gains 5% over AV1 in terms of BD-Rates. By reporting separate numbers for JVET and AOM test sequences, it is ensured that no bias in the test sequences exists. When comparing only intra coding tools, it is observed that the complexity increases exponentially for linearly increasing coding efficiency

    Visual Content Characterization Based on Encoding Rate-Distortion Analysis

    Get PDF
    Visual content characterization is a fundamentally important but under exploited step in dataset construction, which is essential in solving many image processing and computer vision problems. In the era of machine learning, this has become ever more important, because with the explosion of image and video content nowadays, scrutinizing all potential content is impossible and source content selection has become increasingly difficult. In particular, in the area of image/video coding and quality assessment, it is highly desirable to characterize/select source content and subsequently construct image/video datasets that demonstrate strong representativeness and diversity of the visual world, such that the visual coding and quality assessment methods developed from and validated using such datasets exhibit strong generalizability. Encoding Rate-Distortion (RD) analysis is essential for many multimedia applications. Examples of applications that explicitly use RD analysis include image encoder RD optimization, video quality assessment (VQA), and Quality of Experience (QoE) optimization of streaming videos etc. However, encoding RD analysis has not been well investigated in the context of visual content characterization. This thesis focuses on applying encoding RD analysis as a visual source content characterization method with image/video coding and quality assessment applications in mind. We first conduct a video quality subjective evaluation experiment for state-of-the-art video encoder performance analysis and comparison, where our observations reveal severe problems that motivate the needs of better source content characterization and selection methods. Then the effectiveness of RD analysis in visual source content characterization is demonstrated through a proposed quality control mechanism for video coding by eigen analysis in the space of General Quality Parameter (GQP) functions. Finally, by combining encoding RD analysis with submodular set function optimization, we propose a novel method for automating the process of representative source content selection, which helps boost the RD performance of visual encoders trained with the selected visual contents

    Learned-based Intra Coding Tools for Video Compression.

    Get PDF
    PhD Theses.The increase in demand for video rendering in 4K and beyond displays, as well as immersive video formats, requires the use of e cient compression techniques. In this thesis novel methods for enhancing the e ciency of current and next generation video codecs are investigated. Several aspects that in uence the way conventional video coding methods work are considered. The methods proposed in this thesis utilise Neural Networks (NNs) trained for regression tasks in order to predict data. In particular, Convolutional Neural Networks (CNNs) are used to predict Rate-Distortion (RD) data for intra-coded frames. Moreover, a novel intra-prediction methods are proposed with the aim of providing new ways to exploit redundancies overlooked by traditional intraprediction tools. Additionally, it is shown how such methods can be simpli ed in order to derive less resource-demanding tools

    Analysis of affine motion-compensated prediction and its application in aerial video coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom) contained in aerial video sequences such as captured from unmanned aerial vehicles, an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived by extending a model of purely translational motion-compensated prediction. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdf of the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. It is of particular interest since such a model is considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. As the simplified affine motion model is able to describe most motions contained in aerial surveillance videos, its application in video coding is justified. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. Although the bit rate in motion-compensated prediction can be considerably reduced by using a motion model which is able to describe motion types occurring in the scene, the total video bit rate may remain quite high, depending on the motion estimation accuracy. Thus, at the example of aerial surveillance sequences, a codec independent region of interest- ( ROI -) based aerial video coding system is proposed that exploits the characteristic of such sequences. Assuming the captured scene to be planar, one frame can be projected into another using global motion compensation. Consequently, only new emerging areas have to be encoded. At the decoder, all new areas are registered into a so-called mosaic. From this, reconstructed frames are extracted and concatenated as a video sequence. To also preserve moving objects in the reconstructed video, local motion is detected and encoded in addition to the new areas. The proposed general ROI coding system was evaluated for very low and low bit rates between 100 and 5000 kbit/s for aerial sequences of HD resolution. It is able to reduce the bit rate by 90% compared to common HEVC coding of similar quality. Subjective tests confirm that the overall image quality of the ROI coding system exceeds that of a common HEVC encoder especially at very low bit rates below 1 Mbit/s. To prevent discontinuities introduced by inaccurate global motion estimation, as may be caused by radial lens distortion, a fully automatic in-loop radial distortion compensation is proposed. For this purpose, an unknown radial distortion compensation parameter that is constant for a group of frames is jointly estimated with the global motion. This parameter is optimized to minimize the distortions of the projections of frames in the mosaic. By this approach, the global motion compensation was improved by 0.27dB and discontinuities in the frames extracted from the mosaic are diminished. As an additional benefit, the generation of long-term mosaics becomes possible, constructed by more than 1500 aerial frames with unknown radial lens distortion and without any calibration or manual lens distortion compensation.Bewegungskompensierte Prädiktion wird in Videocodierstandards wie High Efficiency Video Coding (HEVC) als ein Schlüsselelement zur Datenkompression verwendet. Typischerweise kommt dabei ein rein translatorisches Bewegungsmodell zum Einsatz. Um auch nicht-translatorische Bewegungen wie Rotation oder Skalierung (Zoom) beschreiben zu können, welche beispielsweise in von unbemannten Luftfahrzeugen aufgezeichneten Luftbildvideosequenzen enthalten sind, kann ein affines Bewegungsmodell verwendet werden. In dieser Arbeit wird aufbauend auf einem rein translatorischen Bewegungsmodell ein Modell für affine bewegungskompensierte Prädiktion hergeleitet. Unter Verwendung der Raten-Verzerrungs-Theorie und des Verschiebungsschätzfehlers, welcher aus einer inexakten affinen Bewegungsschätzung resultiert, wird die minimal erforderliche Bitrate zur Codierung des Prädiktionsfehlers hergeleitet. Für die Modellierung wird angenommen, dass die sechs Parameter einer affinen Transformation durch statistisch unabhängige Schätzfehler gestört sind. Für jeden dieser Schätzfehler wird angenommen, dass die Wahrscheinlichkeitsdichteverteilung einer mittelwertfreien Gaußverteilung entspricht. Aus der Verbundwahrscheinlichkeitsdichte der Schätzfehler wird die Wahrscheinlichkeitsdichte des ortsabhängigen Verschiebungsschätzfehlers im Bild berechnet. Letztere wird schließlich zu der minimalen Bitrate in Beziehung gesetzt, welche für die Codierung des Prädiktionsfehlers benötigt wird. Analog zur obigen Ableitung des Modells für das voll-affine Bewegungsmodell wird ein vereinfachtes affines Bewegungsmodell mit vier Freiheitsgraden untersucht. Ein solches Modell wird derzeit auch im Rahmen der Standardisierung des HEVC-Nachfolgestandards Versatile Video Coding (VVC) evaluiert. Da das vereinfachte Modell bereits die meisten in Luftbildvideosequenzen vorkommenden Bewegungen abbilden kann, ist der Einsatz des vereinfachten affinen Modells in der Videocodierung gerechtfertigt. Beide Modelle liefern wertvolle Informationen über die minimal benötigte Bitrate zur Codierung des Prädiktionsfehlers in Abhängigkeit von der affinen Schätzgenauigkeit. Zwar kann die Bitrate mittels bewegungskompensierter Prädiktion durch Wahl eines geeigneten Bewegungsmodells und akkurater affiner Bewegungsschätzung stark reduziert werden, die verbleibende Gesamtbitrate kann allerdings dennoch relativ hoch sein. Deshalb wird am Beispiel von Luftbildvideosequenzen ein Regionen-von-Interesse- (ROI-) basiertes Codiersystem vorgeschlagen, welches spezielle Eigenschaften solcher Sequenzen ausnutzt. Unter der Annahme, dass eine aufgenommene Szene planar ist, kann ein Bild durch globale Bewegungskompensation in ein anderes projiziert werden. Deshalb müssen vom aktuellen Bild prinzipiell nur noch neu im Bild erscheinende Bereiche codiert werden. Am Decoder werden alle neuen Bildbereiche in einem gemeinsamen Mosaikbild registriert, aus dem schließlich die Einzelbilder der Videosequenz rekonstruiert werden können. Um auch lokale Bewegungen abzubilden, werden bewegte Objekte detektiert und zusätzlich zu neuen Bildbereichen als ROI codiert. Die Leistungsfähigkeit des ROI-Codiersystems wurde insbesondere für sehr niedrige und niedrige Bitraten von 100 bis 5000 kbit/s für Bilder in HD-Auflösung evaluiert. Im Vergleich zu einer gewöhnlichen HEVC-Codierung kann die Bitrate um 90% reduziert werden. Durch subjektive Tests wurde bestätigt, dass das ROI-Codiersystem insbesondere für sehr niedrige Bitraten von unter 1 Mbit/s deutlich leistungsfähiger in Bezug auf Detailauflösung und Gesamteindruck ist als ein herkömmliches HEVC-Referenzsystem. Um Diskontinuitäten in den rekonstruierten Videobildern zu vermeiden, die durch eine durch Linsenverzeichnungen induzierte ungenaue globale Bewegungsschätzung entstehen können, wird eine automatische Radialverzeichnungskorrektur vorgeschlagen. Dabei wird ein unbekannter, jedoch über mehrere Bilder konstanter Korrekturparameter gemeinsam mit der globalen Bewegung geschätzt. Dieser Parameter wird derart optimiert, dass die Projektionen der Bilder in das Mosaik möglichst wenig verzerrt werden. Daraus resultiert eine um 0,27dB verbesserte globale Bewegungskompensation, wodurch weniger Diskontinuitäten in den aus dem Mosaik rekonstruierten Bildern entstehen. Dieses Verfahren ermöglicht zusätzlich die Erstellung von Langzeitmosaiken aus über 1500 Luftbildern mit unbekannter Radialverzeichnung und ohne manuelle Korrektur

    Efficient streaming for high fidelity imaging

    Get PDF
    Researchers and practitioners of graphics, visualisation and imaging have an ever-expanding list of technologies to account for, including (but not limited to) HDR, VR, 4K, 360°, light field and wide colour gamut. As these technologies move from theory to practice, the methods of encoding and transmitting this information need to become more advanced and capable year on year, placing greater demands on latency, bandwidth, and encoding performance. High dynamic range (HDR) video is still in its infancy; the tools for capture, transmission and display of true HDR content are still restricted to professional technicians. Meanwhile, computer graphics are nowadays near-ubiquitous, but to achieve the highest fidelity in real or even reasonable time a user must be located at or near a supercomputer or other specialist workstation. These physical requirements mean that it is not always possible to demonstrate these graphics in any given place at any time, and when the graphics in question are intended to provide a virtual reality experience, the constrains on performance and latency are even tighter. This thesis presents an overall framework for adapting upcoming imaging technologies for efficient streaming, constituting novel work across three areas of imaging technology. Over the course of the thesis, high dynamic range capture, transmission and display is considered, before specifically focusing on the transmission and display of high fidelity rendered graphics, including HDR graphics. Finally, this thesis considers the technical challenges posed by incoming head-mounted displays (HMDs). In addition, a full literature review is presented across all three of these areas, detailing state-of-the-art methods for approaching all three problem sets. In the area of high dynamic range capture, transmission and display, a framework is presented and evaluated for efficient processing, streaming and encoding of high dynamic range video using general-purpose graphics processing unit (GPGPU) technologies. For remote rendering, state-of-the-art methods of augmenting a streamed graphical render are adapted to incorporate HDR video and high fidelity graphics rendering, specifically with regards to path tracing. Finally, a novel method is proposed for streaming graphics to a HMD for virtual reality (VR). This method utilises 360° projections to transmit and reproject stereo imagery to a HMD with minimal latency, with an adaptation for the rapid local production of depth maps

    Arquitectura híbrida para el estándar de compresión de vídeo H.265/HEVC

    Get PDF
    En las últimas décadas las aplicaciones multimedia han evolucionado enormemente. Los estándares de compresión de vídeo han contribuido notablemente a este avance convirtiéndose en esenciales para la transmisión de datos multimedia. Desde 1980, los estándares de compresión de vídeo han enfocado sus principales esfuerzos a reducir la tasa de datos generada y proporcionar un alto nivel de calidad visual. Debido a la creciente demanda de contenido digital, hoy en día resulta de vital importancia optimizar el ancho de banda utilizado para la transmisión de vídeo digital, buscando obtener grandes tasas de compresión de datos sin pérdidas apreciables en la calidad del servicio. Veinte años atrás se consideraba a la compresión de vídeo como algo novedoso, actualmente es algo ubicuo. La mayoría de dispositivos que constituyen la electrónica de consumo hacen uso de una manera u otra de la compresión de vídeo. Aunque la mayoría de usuarios asume que estas mejoras son el resultado de los avances generales que se han producido en la electrónica de consumo, lo cierto es que son principalmente debidas a la compresión de vídeo..

    Smart Sensor Technologies for IoT

    Get PDF
    The recent development in wireless networks and devices has led to novel services that will utilize wireless communication on a new level. Much effort and resources have been dedicated to establishing new communication networks that will support machine-to-machine communication and the Internet of Things (IoT). In these systems, various smart and sensory devices are deployed and connected, enabling large amounts of data to be streamed. Smart services represent new trends in mobile services, i.e., a completely new spectrum of context-aware, personalized, and intelligent services and applications. A variety of existing services utilize information about the position of the user or mobile device. The position of mobile devices is often achieved using the Global Navigation Satellite System (GNSS) chips that are integrated into all modern mobile devices (smartphones). However, GNSS is not always a reliable source of position estimates due to multipath propagation and signal blockage. Moreover, integrating GNSS chips into all devices might have a negative impact on the battery life of future IoT applications. Therefore, alternative solutions to position estimation should be investigated and implemented in IoT applications. This Special Issue, “Smart Sensor Technologies for IoT” aims to report on some of the recent research efforts on this increasingly important topic. The twelve accepted papers in this issue cover various aspects of Smart Sensor Technologies for IoT

    Using Radio Frequency and Motion Sensing to Improve Camera Sensor Systems

    Get PDF
    Camera-based sensor systems have advanced significantly in recent years. This advancement is a combination of camera CMOS (complementary metal-oxide-semiconductor) hardware technology improvement and new computer vision (CV) algorithms that can better process the rich information captured. As the world becoming more connected and digitized through increased deployment of various sensors, cameras have become a cost-effective solution with the advantages of small sensor size, intuitive sensing results, rich visual information, and neural network-friendly. The increased deployment and advantages of camera-based sensor systems have fueled applications such as surveillance, object detection, person re-identification, scene reconstruction, visual tracking, pose estimation, and localization. However, camera-based sensor systems have fundamental limitations such as extreme power consumption, privacy-intrusive, and inability to see-through obstacles and other non-ideal visual conditions such as darkness, smoke, and fog. In this dissertation, we aim to improve the capability and performance of camera-based sensor systems by utilizing additional sensing modalities such as commodity WiFi and mmWave (millimeter wave) radios, and ultra-low-power and low-cost sensors such as inertial measurement units (IMU). In particular, we set out to study three problems: (1) power and storage consumption of continuous-vision wearable cameras, (2) human presence detection, localization, and re-identification in both indoor and outdoor spaces, and (3) augmenting the sensing capability of camera-based systems in non-ideal situations. We propose to use an ultra-low-power, low-cost IMU sensor, along with readily available camera information, to solve the first problem. WiFi devices will be utilized in the second problem, where our goal is to reduce the hardware deployment cost and leverage existing WiFi infrastructure as much as possible. Finally, we will use a low-cost, off-the-shelf mmWave radar to extend the sensing capability of a camera in non-ideal visual sensing situations.Doctor of Philosoph

    Actas de las XIV Jornadas de Ingeniería Telemática (JITEL 2019) Zaragoza (España) 22-24 de octubre de 2019

    Get PDF
    En esta ocasión, es la ciudad de Zaragoza la encargada de servir de anfitriona a las XIV Jornadas de Ingeniería Telemática (JITEL 2019), que se celebrarán del 22 al 24 de octubre de 2019. Las Jornadas de Ingeniería Telemática (JITEL), organizadas por la Asociación de Telemática (ATEL), constituyen un foro propicio de reunión, debate y divulgación para los grupos que imparten docencia e investigan en temas relacionados con las redes y los servicios telemáticos. Con la organización de este evento se pretende fomentar, por un lado el intercambio de experiencias y resultados, además de la comunicación y cooperación entre los grupos de investigación que trabajan en temas relacionados con la telemática. En paralelo a las tradicionales sesiones que caracterizan los congresos científicos, se desea potenciar actividades más abiertas, que estimulen el intercambio de ideas entre los investigadores experimentados y los noveles, así como la creación de vínculos y puntos de encuentro entre los diferentes grupos o equipos de investigación. Para ello, además de invitar a personas relevantes en los campos correspondientes, se van a incluir sesiones de presentación y debate de las líneas y proyectos activos de los mencionados equipos
    corecore