Search CORE

206 research outputs found

Deep-learning based precoding techniques for next-generation video compression

Author: Andreopoulos Yiannis
Bourtsoulatze Eirina
Chadha Aaron
Giotsas Vasileios
Grce Sergio
Publication venue
Publication date: 13/09/2019
Field of study

Several research groups worldwide are currently investigating how deep learning may advance the state-of-the-art in image and video coding. An open question is how to make deep neural networks work in conjunction with existing (and upcoming) video codecs, such as MPEG AVC/H.264, HEVC, VVC, Google VP9 and AOMedia AV1, as well as existing container and transport formats. Such compatibility is a crucial aspect, as the video content industry and hardware manufacturers are expected to remain committed to supporting these standards for the foreseeable future. We propose deep neural networks as precoding components for current and future codec ecosystems. In our current deployments for DASH/HLS adaptive streaming, this comprises downscaling neural networks. Precoding via deep learning allows for full compatibility to current and future codec and transport standards while providing for significant savings. Our results with HD content show that 23%-43% rate reduction takes place under a range of state-of-the-art video codec implementations. The use of precoding can also lead to significant encoding complexity reduction, which is essential for the cloud deployment of complex encoders like AV1 and MPEG VVC. Therefore, beyond bitrate saving, deep-learning based precoding may reduce the required cloud resources for video transcoding and make cloud-based solutions competitive or superior to state-of-the-art captive deployments

Lancaster E-Prints

Actes du 11ème Atelier en Évaluation de Performances

Author: Ayesta Urtzi
Prabhu Balakrishna
Verloop Ina Maria,
Publication venue: HAL CCSD
Publication date: 23/03/2016
Field of study

International audienceLe présent document contient les actes du 11ème Atelier en Évaluation des Performances qui s'est tenu les 15-17 Mars 2016 au LAAS-CNRS, Toulouse. L’Atelier en Évaluation de Performances est une réunion destinée à faire s’exprimer et se rencontrer les jeunes chercheurs (doctorants et postdoctorants) dans le domaine de la Modélisation et de l’Évaluation de Performances, une discipline consacrée à l’étude et l’optimisation de systèmes dynamiques stochastiques et/ou temporisés apparaissant en Informatique, Télécommunications, Productique et Robotique entre autres. La présentation informelle de travaux, même en cours, y est encouragée afin de renforcer les interactions entre jeunes chercheurs et préparer des soumissions de nouveaux projets scientifiques. Des exposés de synthèse sur des domaines de recherche d’actualité, donnés par des chercheurs confirmés du domaine renforcent la partie formation de l’atelier

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Subjective quality assessment of longer duration video sequences delivered over HTTP adaptive streaming to tablet devices

Author: Claeys Maxim
De Cock Jan
De Meulenaere Jonas
De Turck Filip
Demeester Piet
Staelens Nicolas
Van de Walle Rik
Van den Broeck Wendy
Van Wallendael Glenn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

HTTP adaptive streaming facilitates video streaming to mobile devices connected through heterogeneous networks without the need for a dedicated streaming infrastructure. By splitting different encoded versions of the same video into small segments, clients can continuously decide which segments to download based on available network resources and device characteristics. These encoded versions can, for example, differ in terms of bitrate and spatial or temporal resolution. However, as a result of dynamically selecting video segments, perceived video quality can fluctuate during playback which will impact end-users' quality of experience. Subjective studies have already been conducted to assess the influence of video delivery using HTTP Adaptive Streaming to mobile devices. Nevertheless, existing studies are limited to the evaluation of short video sequences in controlled environments. Research has already shown that video duration and assessment environment influence quality perception. Therefore, in this article, we go beyond the traditional ways for subjective quality evaluation by conducting novel experiments on tablet devices in more ecologically valid testing environments using longer duration video sequences. As such, we want to mimic realistic viewing behavior as much as possible. Our results show that both video content and the range of quality switches significantly influence end-users' rating behavior. In general, quality level switches are only perceived in high motion sequences or in case switching occurs between high and low quality video segments. Moreover, we also found that video stallings should be avoided during playback at all times

Ghent University Academic Bibliography

Deep Video Precoding

Author: Andreopoulos Yiannis
Bourtsoulatze Eirina
Chadha Aaron
Fadeev Ilya
Giotsas Vasileios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2019
Field of study

Several groups worldwide are currently investigating how deep learning may advance the state-of-the-art in image and video coding. An open question is how to make deep neural networks work in conjunction with existing (and upcoming) video codecs, such as MPEG H.264/AVC, H.265/HEVC, VVC, Google VP9 and AOMedia AV1, AV2, as well as existing container and transport formats, without imposing any changes at the client side. Such compatibility is a crucial aspect when it comes to practical deployment, especially when considering the fact that the video content industry and hardware manufacturers are expected to remain committed to supporting these standards for the foreseeable future. We propose to use deep neural networks as precoders for current and future video codecs and adaptive video streaming systems. In our current design, the core precoding component comprises a cascaded structure of downscaling neural networks that operates during video encoding, prior to transmission. This is coupled with a precoding mode selection algorithm for each independently-decodable stream segment, which adjusts the downscaling factor according to scene characteristics, the utilized encoder, and the desired bitrate and encoding configuration. Our framework is compatible with all current and future codec and transport standards, as our deep precoding network structure is trained in conjunction with linear upscaling filters (e.g., the bilinear filter), which are supported by all web video players. Extensive evaluation on FHD (1080p) and UHD (2160p) content and with widely-used H.264/AVC, H.265/HEVC and VP9 encoders, as well as a preliminary evaluation with the current test model of VVC (v.6.2rc1), shows that coupling such standards with the proposed deep video precoding allows for 8% to 52% rate reduction under encoding configurations and bitrates suitable for video-on-demand adaptive streaming systems. The use of precoding can also lead to encoding complexity reduction, which is essential for cost-effective cloud deployment of complex encoders like H.265/HEVC, VP9 and VVC, especially when considering the prominence of high-resolution adaptive video streaming

University of Essex Research Repository

arXiv.org e-Print Archive

Lancaster E-Prints

Content-Aware Multimedia Communications

Author: Eichhorn Alexander
Publication venue
Publication date: 01/07/2008
Field of study

The demands for fast, economic and reliable dissemination of multimedia information are steadily growing within our society. While people and economy increasingly rely on communication technologies, engineers still struggle with their growing complexity. Complexity in multimedia communication originates from several sources. The most prominent is the unreliability of packet networks like the Internet. Recent advances in scheduling and error control mechanisms for streaming protocols have shown that the quality and robustness of multimedia delivery can be improved significantly when protocols are aware of the content they deliver. However, the proposed mechanisms require close cooperation between transport systems and application layers which increases the overall system complexity. Current approaches also require expensive metrics and focus on special encoding formats only. A general and efficient model is missing so far. This thesis presents efficient and format-independent solutions to support cross-layer coordination in system architectures. In particular, the first contribution of this work is a generic dependency model that enables transport layers to access content-specific properties of media streams, such as dependencies between data units and their importance. The second contribution is the design of a programming model for streaming communication and its implementation as a middleware architecture. The programming model hides the complexity of protocol stacks behind simple programming abstractions, but exposes cross-layer control and monitoring options to application programmers. For example, our interfaces allow programmers to choose appropriate failure semantics at design time while they can refine error protection and visibility of low-level errors at run-time. Based on some examples we show how our middleware simplifies the integration of stream-based communication into large-scale application architectures. An important result of this work is that despite cross-layer cooperation, neither application nor transport protocol designers experience an increase in complexity. Application programmers can even reuse existing streaming protocols which effectively increases system robustness.Der Bedarf unsere Gesellschaft nach kostengünstiger und zuverlässiger Kommunikation wächst stetig. Während wir uns selbst immer mehr von modernen Kommunikationstechnologien abhängig machen, müssen die Ingenieure dieser Technologien sowohl den Bedarf nach schneller Einführung neuer Produkte befriedigen als auch die wachsende Komplexität der Systeme beherrschen. Gerade die Übertragung multimedialer Inhalte wie Video und Audiodaten ist nicht trivial. Einer der prominentesten Gründe dafür ist die Unzuverlässigkeit heutiger Netzwerke, wie z.B.~dem Internet. Paketverluste und schwankende Laufzeiten können die Darstellungsqualität massiv beeinträchtigen. Wie jüngste Entwicklungen im Bereich der Streaming-Protokolle zeigen, sind jedoch Qualität und Robustheit der Übertragung effizient kontrollierbar, wenn Streamingprotokolle Informationen über den Inhalt der transportierten Daten ausnutzen. Existierende Ansätze, die den Inhalt von Multimediadatenströmen beschreiben, sind allerdings meist auf einzelne Kompressionsverfahren spezialisiert und verwenden berechnungsintensive Metriken. Das reduziert ihren praktischen Nutzen deutlich. Außerdem erfordert der Informationsaustausch eine enge Kooperation zwischen Applikationen und Transportschichten. Da allerdings die Schnittstellen aktueller Systemarchitekturen nicht darauf vorbereitet sind, müssen entweder die Schnittstellen erweitert oder alternative Architekturkonzepte geschaffen werden. Die Gefahr beider Varianten ist jedoch, dass sich die Komplexität eines Systems dadurch weiter erhöhen kann. Das zentrale Ziel dieser Dissertation ist es deshalb, schichtenübergreifende Koordination bei gleichzeitiger Reduzierung der Komplexität zu erreichen. Hier leistet die Arbeit zwei Beträge zum aktuellen Stand der Forschung. Erstens definiert sie ein universelles Modell zur Beschreibung von Inhaltsattributen, wie Wichtigkeiten und Abhängigkeitsbeziehungen innerhalb eines Datenstroms. Transportschichten können dieses Wissen zur effizienten Fehlerkontrolle verwenden. Zweitens beschreibt die Arbeit das Noja Programmiermodell für multimediale Middleware. Noja definiert Abstraktionen zur Übertragung und Kontrolle multimedialer Ströme, die die Koordination von Streamingprotokollen mit Applikationen ermöglichen. Zum Beispiel können Programmierer geeignete Fehlersemantiken und Kommunikationstopologien auswählen und den konkreten Fehlerschutz dann zur Laufzeit verfeinern und kontrolliere

Digitale Bibliothek Thüringen

Recommended from our members

End-to-end 3D video communication over heterogeneous networks

Author: Mohib Hamdullah
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Three-dimensional technology, more commonly referred to as 3D technology, has revolutionised many fields including entertainment, medicine, and communications to name a few. In addition to 3D films, games, and sports channels, 3D perception has made tele-medicine a reality. By the year 2015, 30% of the all HD panels at home will be 3D enabled, predicted by consumer electronics manufacturers. Stereoscopic cameras, a comparatively mature technology compared to other 3D systems, are now being used by ordinary citizens to produce 3D content and share at a click of a button just like they do with the 2D counterparts via sites like YouTube. But technical challenges still exist, including with autostereoscopic multiview displays. 3D content requires many complex considerations--including how to represent it, and deciphering what is the best compression format--when considering transmission or storage, because of its increased amount of data. Any decision must be taken in the light of the available bandwidth or storage capacity, quality and user expectations. Free viewpoint navigation also remains partly unsolved. The most pressing issue getting in the way of widespread uptake of consumer 3D systems is the ability to deliver 3D content to heterogeneous consumer displays over the heterogeneous networks. Optimising 3D video communication solutions must consider the entire pipeline, starting with optimisation at the video source to the end display and transmission optimisation. Multi-view offers the most compelling solution for 3D videos with motion parallax and freedom from wearing headgear for 3D video perception. Optimising multi-view video for delivery and display could increase the demand for true 3D in the consumer market. This thesis focuses on an end-to-end quality optimisation in 3D video communication/transmission, offering solutions for optimisation at the compression, transmission, and decoder levels.Brunel University - Isambard Research Scholarshi

Brunel University Research Archive