Search CORE

14 research outputs found

Video processing for panoramic streaming using HEVC and its scalable extensions

Author: Schierl Thomas
Skupin Robert
Sánchez de la Fuente Yago
Publication venue
Publication date: 01/01/2016
Field of study

Panoramic streaming is a particular way of video streaming where an arbitrary Region-of-Interest (RoI) is transmitted from a high-spatial resolution video, i.e. a video covering a very “wide-angle” (much larger than the human field-of-view – e.g. 360°). Some transport schemes for panoramic video delivery have been proposed and demonstrated within the past decade, which allow users to navigate interactively within the high-resolution videos. With the recent advances of head mounted displays, consumers may soon have immersive and sufficiently convenient end devices at reach, which could lead to an increasing demand for panoramic video experiences. The solution proposed within this paper is built upon tile-based panoramic streaming, where users receive a set of tiles that match their RoI, and consists in a low-complexity compressed domain video processing technique for using H.265/HEVC and its scalable extensions (H.265/SHVC and H.265/MV-HEVC). The proposed technique generates a single video bitstream out of the selected tiles so that a single hardware decoder can be used. It overcomes the scalability issue of previous solutions not using tiles and the battery consumption issue inherent of tile-based panorama streaming, where multiple parallel software decoders are used. In addition, the described technique is capable of reducing peak streaming bitrate during changes of the RoI, which is crucial for allowing a truly immersive and low latency video experience. Besides, it makes it possible to use Open GOP structures without incurring any playback interruption at switching events, which provides a better compression efficiency compared to closed GOP structures

DepositOnce

Springer - Publisher Connector

Fraunhofer-ePrints

Is Smaller Always Better? - Evaluating Video Compression Techniques for Simulation Ensembles

Author: Garth Christoph
Hagen Hans
Leitte Heike
Ruediger Patrick
Publication venue
Publication date: 01/01/2021
Field of study

We provide an evaluation of the applicability of video compression techniques for compressing visualization image databases that are often used for in situ visualization. Considering relevant practical implementation aspects, we identify relevant compression parameters, and evaluate video compression for several test cases, involving several data sets and visualization methods; we use three different video codecs. To quantify the benefits and drawbacks of video compression, we employ metrics for image quality, compression rate, and performance. The experiments discussed provide insight into good choices of parameter values, working well in the considered cases

Dagstuhl Research Online Publication Server

Non-disruptive use of light fields in image and video processing

Author: Hariharan Harini Priyadarshini
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestützten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal für fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestützten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollständige volumetrische Information einer Szene. Diese Technologie bietet dabei das größte Potenzial, immersive Erlebnisse zu mehr Realitätsnähe zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der Inkompatibilität derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen für 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die Kompatibilität zwischen Anwendungen und Geräten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken für eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollständige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen Ansätze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung

Universaar

Acronym

Semi-Automatic Video Object Extraction Menggunakan Alpha Matting Berbasis Motion Estimation

Author: Basuki Ruri Suko
Publication venue
Publication date: 01/03/2016
Field of study

Ekstraksi objek merupakan pekerjaan penting dalam aplikasi video editing, karena objek independen diperlukan untuk proses compositing. Proses ekstraksi dilakukan dengan image matting diawali dengan mendefinisikan scribble manual untuk mewakili daerah foreground dan background, sedangkan daerah unknown ditentukan dengan estimasi alpha. Permasalahan dalam image matting adalah piksel dalam daerah unknown tidak secara tegas menjadi bagian foreground atau background. Sedangkan dalam domain temporal, scribble tidak memungkinkan untuk didefinisikan secara independen di seluruh frame. Untuk mengatasi permasalahan tersebut, diusulkan metode ekstraksi objek dengan tahapan estimasi adaptive threshold untuk alpha matting, perbaikan akurasi image matting, dan estimasi temporal constraint untuk propagasi scribble. Algoritma Fuzzy C-Means (FCM) dan Otsu diaplikasikan untuk estimasi adaptive threshold. Dengan FCM hasil evaluasi menggunakan Means Squared Error (MSE) menunjukkan bahwa rata-rata kesalahan piksel di setiap frame berkurang dari 30.325,10 menjadi 26.999,33, sedangkan dengan Otsu menjadi 28.921,70. Kualitas matting yang menurun akibat perubahan intensitas pada image terkompresi diperbaiki menggunakan Discrete Cosine Transform (DCT-2D). Algoritma ini menurunkan Root Means Squared Error (RMSE) dari 16.68 menjadi 11.44. Estimasi temporal constraint untuk propagasi scribble dilakukan dengan memprediksi motion vector dari frame sekarang ke frame selanjutnya. Prediksi motion vector yang v dilakukan menggunakan exhaustive search diperbaiki dengan mendefinisikan matrik yang berukuran dinamis terhadap ukuran scribble, motion vector ditentukan dengan Sum of Absolute Difference (SAD) antara frame sekarang dan frame berikutnya. Hasilnya ketika diaplikasikan pada ruang warna RGB dapat menurunkan rata-rata kesalahan piksel setiap frame dari 3.058,55 menjadi 1.533,35, sedangkan dalam ruang waktu HSV menjadi 1.662,83. KiMoHar yang merupakan framework yang diusulkan meliputi tiga hal sebagai berikut. Pertama adalah image matting dengan adaptive threshold FCM dapat meningkatkan akurasi sebesar 11.05 %. Kedua, perbaikan kualitas matting pada image terkompresi menggunakan DCT-2D meningkatkan akurasi sebesar 31.41%. Sedangkan yang ketiga, estimasi temporal constraint pada ruang warna RGB meningkatkan akurasi 56.30%, dan dalam ruang HSV 52.61%. ======================================================================================================== It is important to have object extraction in video editing application because compositing process is necessary for independent object. Extraction process is performed by image matting which is defining manual scribble to represent the foreground and background area, and alpha estimation to determine the unknown area. In image matting, there are problem which are pixel in unknown area is not firmly being the part of foreground or background, whereas, in temporal domain, it is not possible to define the scribble independently in whole frame. In order to overcome the problem, object extraction model with adaptive threshold estimation phase for alpha matting, accuracy improvement for image matting, and temporal constraint estimation for scribble propagation is proposed. Fuzzy C-Means (FCM) Algorithm and Otsu are applied for adaptive threshold estimation. By FCM,the evaluationresult byusingMeansSquaredError(MSE) showsthatthe averageerrorof pixelsineachframeis reducedfrom30.325,10 to 26.999,33, while in the use of Otsu, the result shows 28.921,70. The matting quality is reducing since the intensity changing in compressed image improved by Discrete Cosine Transform (DCT-2D). The algorithm reduces Root Means Squared Error (RMSE) value from 16.68 to 11.4. Temporal constraint estimation for scribble propagation is performed by predicting motion vector from recent frame and forward. Motion vector prediction performed using exhaustive search is improved by defining the matrix in dynamic size to scribble; motion vector is determined by Sum of Absolute Difference (SAD) v between recent frame and forward. In its application to RGB space, it results the averageerrorof pixelsineachframe from 3.058,55 to 1.533,35, and 1.662,83 in HSV time space. KiMoHar, the proposed framework, includes three things which are: First, image matting by adaptive threshold FCM increases the accuracy to 11.05%. Second, matting quality improvement in compressed image by DCT-2D increases the accuracy to 31,41%. Three, temporal constraint estimation in RGB space increases the accuracy to 56.30%, and 52.61% in HSV space

ITS Repository

Transcodage rapide de H.264 à HEVC basé sur la propagation du mouvement et une traversée postfixe des unités de codage arborescent

Author: Franche Jean-François
Publication venue: École de technologie supérieure
Publication date
Field of study

En 2013, l’ITU-T et l’ISO ont publié conjointement le plus récent standard de compression vidéo, appelé HEVC. Comparé à son prédécesseur, H.264, ce nouveau standard réduit le débit d’environ 50% pour une qualité vidéo similaire. Pour bénéficier de cette plus grande efficacité de codage, et pour assurer l’interopérabilité entre les systèmes, plusieurs séquences vidéos H.264 doivent être transcodées (converties) en séquences HEVC. La manière la plus simple de réaliser cette opération consiste à décoder entièrement la séquence H.264 source, puis à la réencoder entièrement à l’aide d’un encodeur HEVC. Cette approche, appelée transcodage en cascade dans le domaine des pixels (TCDP), produit un codage efficace et offre un maximum de flexibilité, notamment en ce qui a trait à la configuration de la séquence de sortie. Cependant, elle est très complexe en calculs. Pour réduire cette complexité, plusieurs approches réutilisent de l’information de codage (vecteurs de mouvement, modes de codage, données résiduelles, etc.) extraite de la séquence H.264, afin d’accélérer certaines étapes de l’encodage HEVC. La majorité de ces approches préserve l’efficacité de codage, mais obtient cependant des accélérations limitées (habituellement, entre 2 et 4x, selon l’approche). Dans cette thèse, nous proposons une approche de transcodage H.264 à HEVC plus rapide que celles présentées dans la littérature. Notre solution est composée d’un algorithme de propagation du mouvement et d’une méthode pour réduire le nombre de modes HEVC à tester. L’algorithme de propagation de mouvement crée une liste des vecteurs de mouvement candidats au niveau des unités de codage arborescent (CTU) et, par la suite, sélectionne le meilleur candidat au niveau des unités de prédiction. Cette méthode élimine la redondance des calculs en précalculant l’erreur de prédiction de chaque candidat au niveau des CTUs, et réutilise cette information pour différentes tailles de partitionnement. Pour sa part, l’algorithme de réduction des modes est basé sur un parcours postfixe de la CTU traitée. Cet algorithme permet notamment d’arrêter prématurément le traitement d’un mode jugé non prometteur. Par rapport à une approche de transcodage TCDP, nos résultats expérimentaux montrent que la solution proposée est en moyenne 7.81 fois plus rapide, pour une augmentation moyenne du BD-Rate de 2.05%. Nous expériences montrent également que les résultats obtenus sont significativement supérieurs à ceux de l’état de l’art

Espace ÉTS

Recommended from our members

Subjective and objective quality evaluation of synthetic and high dynamic range images

Author: Kundu Debarati
Publication venue
Publication date: 08/09/2016
Field of study

Recent years have seen a huge growth in the acquisition, transmission, and storage of videos. The visual data consists of both natural scenes as well as synthetic scenes, such as animated movies, cartoons and video games. In all these cases, the ultimate goal is to provide the viewers with a satisfactory quality-of-experience. In addition to the traditional 8-bit images, high dynamic range imaging is also becoming popular because of its ability to represent the real world luminances more realistically. Coming up with objective image quality assessment algorithms for these applications is an interesting research problem. In this work, I have developed a synthetic image quality database by introducing varying degrees of different types of distortions and conducted a subjective experiment in order to obtain the ground-truth data. I evaluated the performance of state-of-the-art image quality assessment algorithms (typically meant for natural images) on this database, especially no-reference algorithms that have not been applied to the domain of computer graphics images before. I identified the top-performing algorithms along with analyzing the types of distortions on which the present algorithms show a less impressive performance. For high dynamic range(HDR) images, I have designed two new full-reference image quality assessment algorithms to judge the quality of tonemapped HDR images using statistical features extracted from them. I have also conducted a massive online crowd-sourced subjective test for HDR image artifacts arising from tonemapping, multiple-exposure fusion and post processing. To the best of our knowledge, presently this is the largest HDR image database in the world involving the largest number of source images and most number of human evaluations. Based on the subjective evaluations obtained, I have also proposed machine learning based no-reference image quality assessment algorithms to predict the perceptual quality of HDR images.Electrical and Computer Engineerin

Texas ScholarWorks

Põhjalik uuring ülisuure dünaamilise ulatusega piltide toonivastendamisest koos subjektiivsete testidega

Author: Salahlı Aygül
Publication venue: Tartu Ülikool
Publication date: 01/01/2017
Field of study

A high dynamic range (HDR) image has a very wide range of luminance levels that traditional low dynamic range (LDR) displays cannot visualize. For this reason, HDR images are usually transformed to 8-bit representations, so that the alpha channel for each pixel is used as an exponent value, sometimes referred to as exponential notation [43]. Tone mapping operators (TMOs) are used to transform high dynamic range to low dynamic range domain by compressing pixels so that traditional LDR display can visualize them. The purpose of this thesis is to identify and analyse differences and similarities between the wide range of tone mapping operators that are available in the literature. Each TMO has been analyzed using subjective studies considering different conditions, which include environment, luminance, and colour. Also, several inverse tone mapping operators, HDR mappings with exposure fusion, histogram adjustment, and retinex have been analysed in this study. 19 different TMOs have been examined using a variety of HDR images. Mean opinion score (MOS) is calculated on those selected TMOs by asking the opinion of 25 independent people considering candidates’ age, vision, and colour blindness

DSpace at Tartu University Library

Artificial Intelligence in the Creative Industries: A Review

Author: Anantrasirichai Nantheera
Bull David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/07/2021
Field of study

This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

arXiv.org e-Print Archive

Explore Bristol Research

Encoding high dynamic range and wide color gamut imagery

Author: Fröhlich Jan
Publication venue
Publication date: 01/01/2017
Field of study

In dieser Dissertation wird ein szenischer Bewegtbilddatensatz mit erweitertem Dynamikumfang (High Dynamic Range, HDR) und großem Farbumfang (Wide Color Gamut, WCG) eingeführt und es werden Modelle zur Kodierung von HDR und WCG Bildern vorgestellt. Die objektive und visuelle Evaluation neuer HDR und WCG Bildverarbeitungsalgorithmen, Kompressionsverfahren und Bildwiedergabegeräte erfordert einen Referenzdatensatz hoher Qualität. Daher wird ein neuer HDR- und WCG-Video-Datensatz mit einem Dynamikumfang von bis zu 18 fotografischen Blenden eingeführt. Er enthält inszenierte und dokumentarische Szenen. Die einzelnen Szenen sind konzipiert um eine Herausforderung für Tone Mapping Operatoren, Gamut Mapping Algorithmen, Kompressionscodecs und HDR und WCG Bildanzeigegeräte darzustellen. Die Szenen sind mit professionellem Licht, Maske und Filmausstattung aufgenommen. Um einen cinematischen Bildeindruck zu erhalten, werden digitale Filmkameras mit ‘Super-35 mm’ Sensorgröße verwendet. Der zusätzliche Informationsgehalt von HDR- und WCG-Videosignalen erfordert im Vergleich zu Signalen mit herkömmlichem Dynamikumfang eine neue und effizientere Signalkodierung. Ein Farbraum für HDR und WCG Video sollte nicht nur effizient quantisieren, sondern wegen der unterschiedlichen Monitoreigenschaften auf der Empfängerseite auch für die Dynamik- und Farbumfangsanpassung geeignet sein. Bisher wurden Methoden für die Quantisierung von HDR Luminanzsignalen vorgeschlagen. Es fehlt jedoch noch ein entsprechendes Modell für Farbdifferenzsignale. Es werden daher zwei neue Farbräume eingeführt, die sich sowohl für die effiziente Kodierung von HDR und WCG Signalen als auch für die Dynamik- und Farbumfangsanpassung eignen. Diese Farbräume werden mit existierenden HDR und WCG Farbsignalkodierungen des aktuellen Stands der Technik verglichen. Die vorgestellten Kodierungsschemata erlauben es, HDR- und WCG-Video mittels drei Farbkanälen mit 12 Bits tonaler Auflösung zu quantisieren, ohne dass Quantisierungsartefakte sichtbar werden. Während die Speicherung und Übertragung von HDR und WCG Video mit 12-Bit Farbtiefe pro Kanal angestrebt wird, unterstützen aktuell verbreitete Dateiformate, Videoschnittstellen und Kompressionscodecs oft nur niedrigere Bittiefen. Um diese existierende Infrastruktur für die HDR Videoübertragung und -speicherung nutzen zu können, wird ein neues bildinhaltsabhängiges Quantisierungsschema eingeführt. Diese Quantisierungsmethode nutzt Bildeigenschaften wie Rauschen und Textur um die benötigte tonale Auflösung für die visuell verlustlose Quantisierung zu schätzen. Die vorgestellte Methode erlaubt es HDR Video mit einer Bittiefe von 10 Bits ohne sichtbare Unterschiede zum Original zu quantisieren und kommt mit weniger Rechenkraft im Vergleich zu aktuellen HDR Bilddifferenzmetriken aus

Quality-aware Content Adaptation in Digital Video Streaming

Author: Wilk Stefan
Publication venue
Publication date: 01/01/2016
Field of study

User-generated video has attracted a lot of attention due to the success of Video Sharing Sites such as YouTube and Online Social Networks. Recently, a shift towards live consumption of these videos is observable. The content is captured and instantly shared over the Internet using smart mobile devices such as smartphones. Large-scale platforms arise such as YouTube.Live, YouNow or Facebook.Live which enable the smartphones of users to livestream to the public. These platforms achieve the distribution of tens of thousands of low resolution videos to remote viewers in parallel. Nonetheless, the providers are not capable to guarantee an efficient collection and distribution of high-quality video streams. As a result, the user experience is often degraded, and the needed infrastructure installments are huge. Efficient methods are required to cope with the increasing demand for these video streams; and an understanding is needed how to capture, process and distribute the videos to guarantee a high-quality experience for viewers. This thesis addresses the quality awareness of user-generated videos by leveraging the concept of content adaptation. Two types of content adaptation, the adaptive video streaming and the video composition, are discussed in this thesis. Then, a novel approach for the given scenario of a live upload from mobile devices, the processing of video streams and their distribution is presented. This thesis demonstrates that content adaptation applied to each step of this scenario, ranging from the upload to the consumption, can significantly improve the quality for the viewer. At the same time, if content adaptation is planned wisely, the data traffic can be reduced while keeping the quality for the viewers high. The first contribution of this thesis is a better understanding of the perceived quality in user-generated video and its influencing factors. Subjective studies are performed to understand what affects the human perception, leading to the first of their kind quality models. Developed quality models are used for the second contribution of this work: novel quality assessment algorithms. A unique attribute of these algorithms is the usage of multiple features from different sensors. Whereas classical video quality assessment algorithms focus on the visual information, the proposed algorithms reduce the runtime by an order of magnitude when using data from other sensors in video capturing devices. Still, the scalability for quality assessment is limited by executing algorithms on a single server. This is solved with the proposed placement and selection component. It allows the distribution of quality assessment tasks to mobile devices and thus increases the scalability of existing approaches by up to 33.71% when using the resources of only 15 mobile devices. These three contributions are required to provide a real-time understanding of the perceived quality of the video streams produced on mobile devices. The upload of video streams is the fourth contribution of this work. It relies on content and mechanism adaptation. The thesis introduces the first prototypically evaluated adaptive video upload protocol (LiViU) which transcodes multiple video representations in real-time and copes with changing network conditions. In addition, a mechanism adaptation is integrated into LiViU to react to changing application scenarios such as streaming high-quality videos to remote viewers or distributing video with a minimal delay to close-by recipients. A second type of content adaptation is discussed in the fifth contribution of this work. An automatic video composition application is presented which enables live composition from multiple user-generated video streams. The proposed application is the first of its kind, allowing the in-time composition of high-quality video streams by inspecting the quality of individual video streams, recording locations and cinematographic rules. As a last contribution, the content-aware adaptive distribution of video streams to mobile devices is introduced by the Video Adaptation Service (VAS). The VAS analyzes the video content streamed to understand which adaptations are most beneficial for a viewer. It maximizes the perceived quality for each video stream individually and at the same time tries to produce as little data traffic as possible - achieving data traffic reduction of more than 80%

TUbiblio

tuprints