14 research outputs found
Video processing for panoramic streaming using HEVC and its scalable extensions
Panoramic streaming is a particular way of video streaming where an arbitrary Region-of-Interest (RoI) is transmitted from a high-spatial resolution video, i.e. a video covering a very âwide-angleâ (much larger than the human field-of-view â e.g. 360°). Some transport schemes for panoramic video delivery have been proposed and demonstrated within the past decade, which allow users to navigate interactively within the high-resolution videos. With the recent advances of head mounted displays, consumers may soon have immersive and sufficiently convenient end devices at reach, which could lead to an increasing demand for panoramic video experiences. The solution proposed within this paper is built upon tile-based panoramic streaming, where users receive a set of tiles that match their RoI, and consists in a low-complexity compressed domain video processing technique for using H.265/HEVC and its scalable extensions (H.265/SHVC and H.265/MV-HEVC). The proposed technique generates a single video bitstream out of the selected tiles so that a single hardware decoder can be used. It overcomes the scalability issue of previous solutions not using tiles and the battery consumption issue inherent of tile-based panorama streaming, where multiple parallel software decoders are used. In addition, the described technique is capable of reducing peak streaming bitrate during changes of the RoI, which is crucial for allowing a truly immersive and low latency video experience. Besides, it makes it possible to use Open GOP structures without incurring any playback interruption at switching events, which provides a better compression efficiency compared to closed GOP structures
Is Smaller Always Better? - Evaluating Video Compression Techniques for Simulation Ensembles
We provide an evaluation of the applicability of video compression techniques for compressing visualization image databases that are often used for in situ visualization. Considering relevant practical implementation aspects, we identify relevant compression parameters, and evaluate video compression for several test cases, involving several data sets and visualization methods; we use three different video codecs. To quantify the benefits and drawbacks of video compression, we employ metrics for image quality, compression rate, and performance. The experiments discussed provide insight into good choices of parameter values, working well in the considered cases
Non-disruptive use of light fields in image and video processing
In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestĂŒtzten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal fĂŒr fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestĂŒtzten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollstĂ€ndige volumetrische Information einer Szene. Diese Technologie bietet dabei das gröĂte Potenzial, immersive Erlebnisse zu mehr RealitĂ€tsnĂ€he zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der InkompatibilitĂ€t derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen fĂŒr 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die KompatibilitĂ€t zwischen Anwendungen und GerĂ€ten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken fĂŒr eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollstĂ€ndige 3D-Information einer Szene beinhalten, wird eine groĂe Menge an Daten, die in hohem MaĂe redundant sind, erfasst. Die hier vorgeschlagenen AnsĂ€tze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung
Semi-Automatic Video Object Extraction Menggunakan Alpha Matting Berbasis Motion Estimation
Ekstraksi objek merupakan pekerjaan penting dalam aplikasi video editing,
karena objek independen diperlukan untuk proses compositing. Proses ekstraksi
dilakukan dengan image matting diawali dengan mendefinisikan scribble manual
untuk mewakili daerah foreground dan background, sedangkan daerah unknown
ditentukan dengan estimasi alpha.
Permasalahan dalam image matting adalah piksel dalam daerah unknown
tidak secara tegas menjadi bagian foreground atau background. Sedangkan dalam
domain temporal, scribble tidak memungkinkan untuk didefinisikan secara
independen di seluruh frame. Untuk mengatasi permasalahan tersebut, diusulkan
metode ekstraksi objek dengan tahapan estimasi adaptive threshold untuk alpha
matting, perbaikan akurasi image matting, dan estimasi temporal constraint untuk
propagasi scribble. Algoritma Fuzzy C-Means (FCM) dan Otsu diaplikasikan untuk
estimasi adaptive threshold.
Dengan FCM hasil evaluasi menggunakan Means Squared Error (MSE)
menunjukkan bahwa rata-rata kesalahan piksel di setiap frame berkurang dari
30.325,10 menjadi 26.999,33, sedangkan dengan Otsu menjadi 28.921,70. Kualitas
matting yang menurun akibat perubahan intensitas pada image terkompresi
diperbaiki menggunakan Discrete Cosine Transform (DCT-2D). Algoritma ini
menurunkan Root Means Squared Error (RMSE) dari 16.68 menjadi 11.44. Estimasi
temporal constraint untuk propagasi scribble dilakukan dengan memprediksi motion
vector dari frame sekarang ke frame selanjutnya. Prediksi motion vector yang
v
dilakukan menggunakan exhaustive search diperbaiki dengan mendefinisikan matrik
yang berukuran dinamis terhadap ukuran scribble, motion vector ditentukan dengan
Sum of Absolute Difference (SAD) antara frame sekarang dan frame berikutnya.
Hasilnya ketika diaplikasikan pada ruang warna RGB dapat menurunkan rata-rata
kesalahan piksel setiap frame dari 3.058,55 menjadi 1.533,35, sedangkan dalam
ruang waktu HSV menjadi 1.662,83.
KiMoHar yang merupakan framework yang diusulkan meliputi tiga hal
sebagai berikut. Pertama adalah image matting dengan adaptive threshold FCM
dapat meningkatkan akurasi sebesar 11.05 %. Kedua, perbaikan kualitas matting
pada image terkompresi menggunakan DCT-2D meningkatkan akurasi sebesar
31.41%. Sedangkan yang ketiga, estimasi temporal constraint pada ruang warna
RGB meningkatkan akurasi 56.30%, dan dalam ruang HSV 52.61%.
========================================================================================================
It is important to have object extraction in video editing application because
compositing process is necessary for independent object. Extraction process is
performed by image matting which is defining manual scribble to represent the
foreground and background area, and alpha estimation to determine the unknown
area.
In image matting, there are problem which are pixel in unknown area is not
firmly being the part of foreground or background, whereas, in temporal domain, it is
not possible to define the scribble independently in whole frame. In order to
overcome the problem, object extraction model with adaptive threshold estimation
phase for alpha matting, accuracy improvement for image matting, and temporal
constraint estimation for scribble propagation is proposed. Fuzzy C-Means (FCM)
Algorithm and Otsu are applied for adaptive threshold estimation.
By FCM,the evaluationresult byusingMeansSquaredError(MSE) showsthatthe
averageerrorof pixelsineachframeis reducedfrom30.325,10 to 26.999,33, while in the
use of Otsu, the result shows 28.921,70. The matting quality is reducing since the
intensity changing in compressed image improved by Discrete Cosine Transform
(DCT-2D). The algorithm reduces Root Means Squared Error (RMSE) value from
16.68 to 11.4. Temporal constraint estimation for scribble propagation is performed
by predicting motion vector from recent frame and forward. Motion vector prediction
performed using exhaustive search is improved by defining the matrix in dynamic size
to scribble; motion vector is determined by Sum of Absolute Difference (SAD)
v
between recent frame and forward. In its application to RGB space, it results the
averageerrorof pixelsineachframe from 3.058,55 to 1.533,35, and 1.662,83 in HSV
time space.
KiMoHar, the proposed framework, includes three things which are: First,
image matting by adaptive threshold FCM increases the accuracy to 11.05%. Second,
matting quality improvement in compressed image by DCT-2D increases the
accuracy to 31,41%. Three, temporal constraint estimation in RGB space increases
the accuracy to 56.30%, and 52.61% in HSV space
Transcodage rapide de H.264 à HEVC basé sur la propagation du mouvement et une traversée postfixe des unités de codage arborescent
En 2013, lâITU-T et lâISO ont publiĂ© conjointement le plus rĂ©cent standard de compression vidĂ©o, appelĂ© HEVC. ComparĂ© Ă son prĂ©dĂ©cesseur, H.264, ce nouveau standard rĂ©duit le dĂ©bit dâenviron 50% pour une qualitĂ© vidĂ©o similaire. Pour bĂ©nĂ©ficier de cette plus grande efficacitĂ© de codage, et pour assurer lâinteropĂ©rabilitĂ© entre les systĂšmes, plusieurs sĂ©quences vidĂ©os H.264 doivent ĂȘtre transcodĂ©es (converties) en sĂ©quences HEVC. La maniĂšre la plus simple de rĂ©aliser cette opĂ©ration consiste Ă dĂ©coder entiĂšrement la sĂ©quence H.264 source, puis Ă la rĂ©encoder entiĂšrement Ă lâaide dâun encodeur HEVC. Cette approche, appelĂ©e transcodage en cascade dans le domaine des pixels (TCDP), produit un codage efficace et offre un maximum de flexibilitĂ©, notamment en ce qui a trait Ă la configuration de la sĂ©quence de sortie. Cependant, elle est trĂšs complexe en calculs. Pour rĂ©duire cette complexitĂ©, plusieurs approches rĂ©utilisent de lâinformation de codage (vecteurs de mouvement, modes de codage, donnĂ©es rĂ©siduelles, etc.) extraite de la sĂ©quence H.264, afin dâaccĂ©lĂ©rer certaines Ă©tapes de lâencodage HEVC. La majoritĂ© de ces approches prĂ©serve lâefficacitĂ© de codage, mais obtient cependant des accĂ©lĂ©rations limitĂ©es (habituellement, entre 2 et 4x, selon lâapproche).
Dans cette thĂšse, nous proposons une approche de transcodage H.264 Ă HEVC plus rapide que celles prĂ©sentĂ©es dans la littĂ©rature. Notre solution est composĂ©e dâun algorithme de propagation du mouvement et dâune mĂ©thode pour rĂ©duire le nombre de modes HEVC Ă tester. Lâalgorithme de propagation de mouvement crĂ©e une liste des vecteurs de mouvement candidats au niveau des unitĂ©s de codage arborescent (CTU) et, par la suite, sĂ©lectionne le meilleur candidat au niveau des unitĂ©s de prĂ©diction. Cette mĂ©thode Ă©limine la redondance des calculs en prĂ©calculant lâerreur de prĂ©diction de chaque candidat au niveau des CTUs, et rĂ©utilise cette information pour diffĂ©rentes tailles de partitionnement. Pour sa part, lâalgorithme de rĂ©duction des modes est basĂ© sur un parcours postfixe de la CTU traitĂ©e. Cet algorithme permet notamment dâarrĂȘter prĂ©maturĂ©ment le traitement dâun mode jugĂ© non prometteur.
Par rapport Ă une approche de transcodage TCDP, nos rĂ©sultats expĂ©rimentaux montrent que la solution proposĂ©e est en moyenne 7.81 fois plus rapide, pour une augmentation moyenne du BD-Rate de 2.05%. Nous expĂ©riences montrent Ă©galement que les rĂ©sultats obtenus sont significativement supĂ©rieurs Ă ceux de lâĂ©tat de lâart
Recommended from our members
Subjective and objective quality evaluation of synthetic and high dynamic range images
Recent years have seen a huge growth in the acquisition, transmission, and storage of videos. The visual data consists of both natural scenes as well as synthetic scenes, such as animated movies, cartoons and video games. In all these cases, the ultimate goal is to provide the viewers with a satisfactory quality-of-experience. In addition to the traditional 8-bit images, high dynamic range imaging is also becoming popular because of its ability to represent the real world luminances more realistically. Coming up with objective image quality assessment algorithms for these applications is an interesting research problem. In this work, I have developed a synthetic image quality database by introducing varying degrees of different types of distortions and conducted a subjective experiment in order to obtain the ground-truth data. I evaluated the performance of state-of-the-art image quality assessment algorithms (typically meant for natural images) on this database, especially no-reference algorithms that have not been applied to the domain of computer graphics images before. I identified the top-performing algorithms along with analyzing the types of distortions on which the present algorithms show a less impressive performance. For high dynamic range(HDR) images, I have designed two new full-reference image quality assessment algorithms to judge the quality of tonemapped HDR images using statistical features extracted from them. I have also conducted a massive online crowd-sourced subjective test for HDR image artifacts arising from tonemapping, multiple-exposure fusion and post processing. To the best of our knowledge, presently this is the largest HDR image database in the world involving the largest number of source images and most number of human evaluations. Based on the subjective evaluations obtained, I have also proposed machine learning based no-reference image quality assessment algorithms to predict the perceptual quality of HDR images.Electrical and Computer Engineerin
PĂ”hjalik uuring ĂŒlisuure dĂŒnaamilise ulatusega piltide toonivastendamisest koos subjektiivsete testidega
A high dynamic range (HDR) image has a very wide range of luminance levels that
traditional low dynamic range (LDR) displays cannot visualize. For this reason, HDR
images are usually transformed to 8-bit representations, so that the alpha channel for
each pixel is used as an exponent value, sometimes referred to as exponential notation
[43]. Tone mapping operators (TMOs) are used to transform high dynamic range to
low dynamic range domain by compressing pixels so that traditional LDR display can
visualize them. The purpose of this thesis is to identify and analyse differences and
similarities between the wide range of tone mapping operators that are available in the
literature. Each TMO has been analyzed using subjective studies considering different
conditions, which include environment, luminance, and colour. Also, several inverse
tone mapping operators, HDR mappings with exposure fusion, histogram adjustment,
and retinex have been analysed in this study. 19 different TMOs have been examined
using a variety of HDR images. Mean opinion score (MOS) is calculated on those selected
TMOs by asking the opinion of 25 independent people considering candidatesâ
age, vision, and colour blindness
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Encoding high dynamic range and wide color gamut imagery
In dieser Dissertation wird ein szenischer Bewegtbilddatensatz mit erweitertem Dynamikumfang (High Dynamic Range, HDR) und groĂem Farbumfang (Wide Color Gamut, WCG) eingefĂŒhrt und es werden Modelle zur Kodierung von HDR und WCG Bildern vorgestellt.
Die objektive und visuelle Evaluation neuer HDR und WCG Bildverarbeitungsalgorithmen, Kompressionsverfahren und BildwiedergabegerĂ€te erfordert einen Referenzdatensatz hoher QualitĂ€t. Daher wird ein neuer HDR- und WCG-Video-Datensatz mit einem Dynamikumfang von bis zu 18 fotografischen Blenden eingefĂŒhrt. Er enthĂ€lt inszenierte und dokumentarische Szenen. Die einzelnen Szenen sind konzipiert um eine Herausforderung fĂŒr Tone Mapping Operatoren, Gamut Mapping Algorithmen, Kompressionscodecs und HDR und WCG BildanzeigegerĂ€te darzustellen. Die Szenen sind mit professionellem Licht, Maske und Filmausstattung aufgenommen. Um einen cinematischen Bildeindruck zu erhalten, werden digitale Filmkameras mit âSuper-35 mmâ SensorgröĂe verwendet.
Der zusĂ€tzliche Informationsgehalt von HDR- und WCG-Videosignalen erfordert im Vergleich zu Signalen mit herkömmlichem Dynamikumfang eine neue und effizientere Signalkodierung. Ein Farbraum fĂŒr HDR und WCG Video sollte nicht nur effizient quantisieren, sondern wegen der unterschiedlichen Monitoreigenschaften auf der EmpfĂ€ngerseite auch fĂŒr die Dynamik- und Farbumfangsanpassung geeignet sein. Bisher wurden Methoden fĂŒr die Quantisierung von HDR Luminanzsignalen vorgeschlagen. Es fehlt jedoch noch ein entsprechendes Modell fĂŒr Farbdifferenzsignale. Es werden daher zwei neue FarbrĂ€ume eingefĂŒhrt, die sich sowohl fĂŒr die effiziente Kodierung von HDR und WCG Signalen als auch fĂŒr die Dynamik- und Farbumfangsanpassung eignen. Diese FarbrĂ€ume werden mit existierenden HDR und WCG Farbsignalkodierungen des aktuellen Stands der Technik verglichen. Die vorgestellten Kodierungsschemata erlauben es, HDR- und WCG-Video mittels drei FarbkanĂ€len mit 12 Bits tonaler Auflösung zu quantisieren, ohne dass Quantisierungsartefakte sichtbar werden.
WĂ€hrend die Speicherung und Ăbertragung von HDR und WCG Video mit 12-Bit Farbtiefe pro Kanal angestrebt wird, unterstĂŒtzen aktuell verbreitete Dateiformate, Videoschnittstellen und Kompressionscodecs oft nur niedrigere Bittiefen. Um diese existierende Infrastruktur fĂŒr die HDR VideoĂŒbertragung und -speicherung nutzen zu können, wird ein neues bildinhaltsabhĂ€ngiges Quantisierungsschema eingefĂŒhrt. Diese Quantisierungsmethode nutzt Bildeigenschaften wie Rauschen und Textur um die benötigte tonale Auflösung fĂŒr die visuell verlustlose Quantisierung zu schĂ€tzen. Die vorgestellte Methode erlaubt es HDR Video mit einer Bittiefe von 10 Bits ohne sichtbare Unterschiede zum Original zu quantisieren und kommt mit weniger Rechenkraft im Vergleich zu aktuellen HDR Bilddifferenzmetriken aus
Quality-aware Content Adaptation in Digital Video Streaming
User-generated video has attracted a lot of attention due to the success of Video Sharing Sites such as YouTube and Online Social Networks. Recently, a shift towards live consumption of these videos is observable. The content is captured and instantly shared over the Internet using smart mobile devices such as smartphones. Large-scale platforms arise such as YouTube.Live, YouNow or Facebook.Live which enable the smartphones of users to livestream to the public. These platforms achieve the distribution of tens of thousands of low resolution videos to remote viewers in parallel. Nonetheless, the providers are not capable to guarantee an efficient collection and distribution of high-quality video streams. As a result, the user experience is often degraded, and the needed infrastructure installments are huge. Efficient methods are required to cope with the increasing demand for these video streams; and an understanding is needed how to capture, process and distribute the videos to guarantee a high-quality experience for viewers. This thesis addresses the quality awareness of user-generated videos by leveraging the concept of content adaptation. Two types of content adaptation, the adaptive video streaming and the video composition, are discussed in this thesis. Then, a novel approach for the given scenario of a live upload from mobile devices, the processing of video streams and their distribution is presented. This thesis demonstrates that content adaptation applied to each step of this scenario, ranging from the upload to the consumption, can significantly improve the quality for the viewer. At the same time, if content adaptation is planned wisely, the data traffic can be reduced while keeping the quality for the viewers high. The first contribution of this thesis is a better understanding of the perceived quality in user-generated video and its influencing factors. Subjective studies are performed to understand what affects the human perception, leading to the first of their kind quality models. Developed quality models are used for the second contribution of this work: novel quality assessment algorithms. A unique attribute of these algorithms is the usage of multiple features from different sensors. Whereas classical video quality assessment algorithms focus on the visual information, the proposed algorithms reduce the runtime by an order of magnitude when using data from other sensors in video capturing devices. Still, the scalability for quality assessment is limited by executing algorithms on a single server. This is solved with the proposed placement and selection component. It allows the distribution of quality assessment tasks to mobile devices and thus increases the scalability of existing approaches by up to 33.71% when using the resources of only 15 mobile devices. These three contributions are required to provide a real-time understanding of the perceived quality of the video streams produced on mobile devices. The upload of video streams is the fourth contribution of this work. It relies on content and mechanism adaptation. The thesis introduces the first prototypically evaluated adaptive video upload protocol (LiViU) which transcodes multiple video representations in real-time and copes with changing network conditions. In addition, a mechanism adaptation is integrated into LiViU to react to changing application scenarios such as streaming high-quality videos to remote viewers or distributing video with a minimal delay to close-by recipients. A second type of content adaptation is discussed in the fifth contribution of this work. An automatic video composition application is presented which enables live composition from multiple user-generated video streams. The proposed application is the first of its kind, allowing the in-time composition of high-quality video streams by inspecting the quality of individual video streams, recording locations and cinematographic rules. As a last contribution, the content-aware adaptive distribution of video streams to mobile devices is introduced by the Video Adaptation Service (VAS). The VAS analyzes the video content streamed to understand which adaptations are most beneficial for a viewer. It maximizes the perceived quality for each video stream individually and at the same time tries to produce as little data traffic as possible - achieving data traffic reduction of more than 80%