14 research outputs found

    Video processing for panoramic streaming using HEVC and its scalable extensions

    Get PDF
    Panoramic streaming is a particular way of video streaming where an arbitrary Region-of-Interest (RoI) is transmitted from a high-spatial resolution video, i.e. a video covering a very “wide-angle” (much larger than the human field-of-view – e.g. 360°). Some transport schemes for panoramic video delivery have been proposed and demonstrated within the past decade, which allow users to navigate interactively within the high-resolution videos. With the recent advances of head mounted displays, consumers may soon have immersive and sufficiently convenient end devices at reach, which could lead to an increasing demand for panoramic video experiences. The solution proposed within this paper is built upon tile-based panoramic streaming, where users receive a set of tiles that match their RoI, and consists in a low-complexity compressed domain video processing technique for using H.265/HEVC and its scalable extensions (H.265/SHVC and H.265/MV-HEVC). The proposed technique generates a single video bitstream out of the selected tiles so that a single hardware decoder can be used. It overcomes the scalability issue of previous solutions not using tiles and the battery consumption issue inherent of tile-based panorama streaming, where multiple parallel software decoders are used. In addition, the described technique is capable of reducing peak streaming bitrate during changes of the RoI, which is crucial for allowing a truly immersive and low latency video experience. Besides, it makes it possible to use Open GOP structures without incurring any playback interruption at switching events, which provides a better compression efficiency compared to closed GOP structures

    Is Smaller Always Better? - Evaluating Video Compression Techniques for Simulation Ensembles

    Get PDF
    We provide an evaluation of the applicability of video compression techniques for compressing visualization image databases that are often used for in situ visualization. Considering relevant practical implementation aspects, we identify relevant compression parameters, and evaluate video compression for several test cases, involving several data sets and visualization methods; we use three different video codecs. To quantify the benefits and drawbacks of video compression, we employ metrics for image quality, compression rate, and performance. The experiments discussed provide insight into good choices of parameter values, working well in the considered cases

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestĂŒtzten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal fĂŒr fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestĂŒtzten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollstĂ€ndige volumetrische Information einer Szene. Diese Technologie bietet dabei das grĂ¶ĂŸte Potenzial, immersive Erlebnisse zu mehr RealitĂ€tsnĂ€he zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der InkompatibilitĂ€t derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen fĂŒr 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die KompatibilitĂ€t zwischen Anwendungen und GerĂ€ten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken fĂŒr eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollstĂ€ndige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen AnsĂ€tze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung

    Semi-Automatic Video Object Extraction Menggunakan Alpha Matting Berbasis Motion Estimation

    Get PDF
    Ekstraksi objek merupakan pekerjaan penting dalam aplikasi video editing, karena objek independen diperlukan untuk proses compositing. Proses ekstraksi dilakukan dengan image matting diawali dengan mendefinisikan scribble manual untuk mewakili daerah foreground dan background, sedangkan daerah unknown ditentukan dengan estimasi alpha. Permasalahan dalam image matting adalah piksel dalam daerah unknown tidak secara tegas menjadi bagian foreground atau background. Sedangkan dalam domain temporal, scribble tidak memungkinkan untuk didefinisikan secara independen di seluruh frame. Untuk mengatasi permasalahan tersebut, diusulkan metode ekstraksi objek dengan tahapan estimasi adaptive threshold untuk alpha matting, perbaikan akurasi image matting, dan estimasi temporal constraint untuk propagasi scribble. Algoritma Fuzzy C-Means (FCM) dan Otsu diaplikasikan untuk estimasi adaptive threshold. Dengan FCM hasil evaluasi menggunakan Means Squared Error (MSE) menunjukkan bahwa rata-rata kesalahan piksel di setiap frame berkurang dari 30.325,10 menjadi 26.999,33, sedangkan dengan Otsu menjadi 28.921,70. Kualitas matting yang menurun akibat perubahan intensitas pada image terkompresi diperbaiki menggunakan Discrete Cosine Transform (DCT-2D). Algoritma ini menurunkan Root Means Squared Error (RMSE) dari 16.68 menjadi 11.44. Estimasi temporal constraint untuk propagasi scribble dilakukan dengan memprediksi motion vector dari frame sekarang ke frame selanjutnya. Prediksi motion vector yang v dilakukan menggunakan exhaustive search diperbaiki dengan mendefinisikan matrik yang berukuran dinamis terhadap ukuran scribble, motion vector ditentukan dengan Sum of Absolute Difference (SAD) antara frame sekarang dan frame berikutnya. Hasilnya ketika diaplikasikan pada ruang warna RGB dapat menurunkan rata-rata kesalahan piksel setiap frame dari 3.058,55 menjadi 1.533,35, sedangkan dalam ruang waktu HSV menjadi 1.662,83. KiMoHar yang merupakan framework yang diusulkan meliputi tiga hal sebagai berikut. Pertama adalah image matting dengan adaptive threshold FCM dapat meningkatkan akurasi sebesar 11.05 %. Kedua, perbaikan kualitas matting pada image terkompresi menggunakan DCT-2D meningkatkan akurasi sebesar 31.41%. Sedangkan yang ketiga, estimasi temporal constraint pada ruang warna RGB meningkatkan akurasi 56.30%, dan dalam ruang HSV 52.61%. ======================================================================================================== It is important to have object extraction in video editing application because compositing process is necessary for independent object. Extraction process is performed by image matting which is defining manual scribble to represent the foreground and background area, and alpha estimation to determine the unknown area. In image matting, there are problem which are pixel in unknown area is not firmly being the part of foreground or background, whereas, in temporal domain, it is not possible to define the scribble independently in whole frame. In order to overcome the problem, object extraction model with adaptive threshold estimation phase for alpha matting, accuracy improvement for image matting, and temporal constraint estimation for scribble propagation is proposed. Fuzzy C-Means (FCM) Algorithm and Otsu are applied for adaptive threshold estimation. By FCM,the evaluationresult byusingMeansSquaredError(MSE) showsthatthe averageerrorof pixelsineachframeis reducedfrom30.325,10 to 26.999,33, while in the use of Otsu, the result shows 28.921,70. The matting quality is reducing since the intensity changing in compressed image improved by Discrete Cosine Transform (DCT-2D). The algorithm reduces Root Means Squared Error (RMSE) value from 16.68 to 11.4. Temporal constraint estimation for scribble propagation is performed by predicting motion vector from recent frame and forward. Motion vector prediction performed using exhaustive search is improved by defining the matrix in dynamic size to scribble; motion vector is determined by Sum of Absolute Difference (SAD) v between recent frame and forward. In its application to RGB space, it results the averageerrorof pixelsineachframe from 3.058,55 to 1.533,35, and 1.662,83 in HSV time space. KiMoHar, the proposed framework, includes three things which are: First, image matting by adaptive threshold FCM increases the accuracy to 11.05%. Second, matting quality improvement in compressed image by DCT-2D increases the accuracy to 31,41%. Three, temporal constraint estimation in RGB space increases the accuracy to 56.30%, and 52.61% in HSV space

    Transcodage rapide de H.264 à HEVC basé sur la propagation du mouvement et une traversée postfixe des unités de codage arborescent

    Get PDF
    En 2013, l’ITU-T et l’ISO ont publiĂ© conjointement le plus rĂ©cent standard de compression vidĂ©o, appelĂ© HEVC. ComparĂ© Ă  son prĂ©dĂ©cesseur, H.264, ce nouveau standard rĂ©duit le dĂ©bit d’environ 50% pour une qualitĂ© vidĂ©o similaire. Pour bĂ©nĂ©ficier de cette plus grande efficacitĂ© de codage, et pour assurer l’interopĂ©rabilitĂ© entre les systĂšmes, plusieurs sĂ©quences vidĂ©os H.264 doivent ĂȘtre transcodĂ©es (converties) en sĂ©quences HEVC. La maniĂšre la plus simple de rĂ©aliser cette opĂ©ration consiste Ă  dĂ©coder entiĂšrement la sĂ©quence H.264 source, puis Ă  la rĂ©encoder entiĂšrement Ă  l’aide d’un encodeur HEVC. Cette approche, appelĂ©e transcodage en cascade dans le domaine des pixels (TCDP), produit un codage efficace et offre un maximum de flexibilitĂ©, notamment en ce qui a trait Ă  la configuration de la sĂ©quence de sortie. Cependant, elle est trĂšs complexe en calculs. Pour rĂ©duire cette complexitĂ©, plusieurs approches rĂ©utilisent de l’information de codage (vecteurs de mouvement, modes de codage, donnĂ©es rĂ©siduelles, etc.) extraite de la sĂ©quence H.264, afin d’accĂ©lĂ©rer certaines Ă©tapes de l’encodage HEVC. La majoritĂ© de ces approches prĂ©serve l’efficacitĂ© de codage, mais obtient cependant des accĂ©lĂ©rations limitĂ©es (habituellement, entre 2 et 4x, selon l’approche). Dans cette thĂšse, nous proposons une approche de transcodage H.264 Ă  HEVC plus rapide que celles prĂ©sentĂ©es dans la littĂ©rature. Notre solution est composĂ©e d’un algorithme de propagation du mouvement et d’une mĂ©thode pour rĂ©duire le nombre de modes HEVC Ă  tester. L’algorithme de propagation de mouvement crĂ©e une liste des vecteurs de mouvement candidats au niveau des unitĂ©s de codage arborescent (CTU) et, par la suite, sĂ©lectionne le meilleur candidat au niveau des unitĂ©s de prĂ©diction. Cette mĂ©thode Ă©limine la redondance des calculs en prĂ©calculant l’erreur de prĂ©diction de chaque candidat au niveau des CTUs, et rĂ©utilise cette information pour diffĂ©rentes tailles de partitionnement. Pour sa part, l’algorithme de rĂ©duction des modes est basĂ© sur un parcours postfixe de la CTU traitĂ©e. Cet algorithme permet notamment d’arrĂȘter prĂ©maturĂ©ment le traitement d’un mode jugĂ© non prometteur. Par rapport Ă  une approche de transcodage TCDP, nos rĂ©sultats expĂ©rimentaux montrent que la solution proposĂ©e est en moyenne 7.81 fois plus rapide, pour une augmentation moyenne du BD-Rate de 2.05%. Nous expĂ©riences montrent Ă©galement que les rĂ©sultats obtenus sont significativement supĂ©rieurs Ă  ceux de l’état de l’art

    PĂ”hjalik uuring ĂŒlisuure dĂŒnaamilise ulatusega piltide toonivastendamisest koos subjektiivsete testidega

    Get PDF
    A high dynamic range (HDR) image has a very wide range of luminance levels that traditional low dynamic range (LDR) displays cannot visualize. For this reason, HDR images are usually transformed to 8-bit representations, so that the alpha channel for each pixel is used as an exponent value, sometimes referred to as exponential notation [43]. Tone mapping operators (TMOs) are used to transform high dynamic range to low dynamic range domain by compressing pixels so that traditional LDR display can visualize them. The purpose of this thesis is to identify and analyse differences and similarities between the wide range of tone mapping operators that are available in the literature. Each TMO has been analyzed using subjective studies considering different conditions, which include environment, luminance, and colour. Also, several inverse tone mapping operators, HDR mappings with exposure fusion, histogram adjustment, and retinex have been analysed in this study. 19 different TMOs have been examined using a variety of HDR images. Mean opinion score (MOS) is calculated on those selected TMOs by asking the opinion of 25 independent people considering candidates’ age, vision, and colour blindness

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Encoding high dynamic range and wide color gamut imagery

    Get PDF
    In dieser Dissertation wird ein szenischer Bewegtbilddatensatz mit erweitertem Dynamikumfang (High Dynamic Range, HDR) und großem Farbumfang (Wide Color Gamut, WCG) eingefĂŒhrt und es werden Modelle zur Kodierung von HDR und WCG Bildern vorgestellt. Die objektive und visuelle Evaluation neuer HDR und WCG Bildverarbeitungsalgorithmen, Kompressionsverfahren und BildwiedergabegerĂ€te erfordert einen Referenzdatensatz hoher QualitĂ€t. Daher wird ein neuer HDR- und WCG-Video-Datensatz mit einem Dynamikumfang von bis zu 18 fotografischen Blenden eingefĂŒhrt. Er enthĂ€lt inszenierte und dokumentarische Szenen. Die einzelnen Szenen sind konzipiert um eine Herausforderung fĂŒr Tone Mapping Operatoren, Gamut Mapping Algorithmen, Kompressionscodecs und HDR und WCG BildanzeigegerĂ€te darzustellen. Die Szenen sind mit professionellem Licht, Maske und Filmausstattung aufgenommen. Um einen cinematischen Bildeindruck zu erhalten, werden digitale Filmkameras mit ‘Super-35 mm’ SensorgrĂ¶ĂŸe verwendet. Der zusĂ€tzliche Informationsgehalt von HDR- und WCG-Videosignalen erfordert im Vergleich zu Signalen mit herkömmlichem Dynamikumfang eine neue und effizientere Signalkodierung. Ein Farbraum fĂŒr HDR und WCG Video sollte nicht nur effizient quantisieren, sondern wegen der unterschiedlichen Monitoreigenschaften auf der EmpfĂ€ngerseite auch fĂŒr die Dynamik- und Farbumfangsanpassung geeignet sein. Bisher wurden Methoden fĂŒr die Quantisierung von HDR Luminanzsignalen vorgeschlagen. Es fehlt jedoch noch ein entsprechendes Modell fĂŒr Farbdifferenzsignale. Es werden daher zwei neue FarbrĂ€ume eingefĂŒhrt, die sich sowohl fĂŒr die effiziente Kodierung von HDR und WCG Signalen als auch fĂŒr die Dynamik- und Farbumfangsanpassung eignen. Diese FarbrĂ€ume werden mit existierenden HDR und WCG Farbsignalkodierungen des aktuellen Stands der Technik verglichen. Die vorgestellten Kodierungsschemata erlauben es, HDR- und WCG-Video mittels drei FarbkanĂ€len mit 12 Bits tonaler Auflösung zu quantisieren, ohne dass Quantisierungsartefakte sichtbar werden. WĂ€hrend die Speicherung und Übertragung von HDR und WCG Video mit 12-Bit Farbtiefe pro Kanal angestrebt wird, unterstĂŒtzen aktuell verbreitete Dateiformate, Videoschnittstellen und Kompressionscodecs oft nur niedrigere Bittiefen. Um diese existierende Infrastruktur fĂŒr die HDR VideoĂŒbertragung und -speicherung nutzen zu können, wird ein neues bildinhaltsabhĂ€ngiges Quantisierungsschema eingefĂŒhrt. Diese Quantisierungsmethode nutzt Bildeigenschaften wie Rauschen und Textur um die benötigte tonale Auflösung fĂŒr die visuell verlustlose Quantisierung zu schĂ€tzen. Die vorgestellte Methode erlaubt es HDR Video mit einer Bittiefe von 10 Bits ohne sichtbare Unterschiede zum Original zu quantisieren und kommt mit weniger Rechenkraft im Vergleich zu aktuellen HDR Bilddifferenzmetriken aus

    Quality-aware Content Adaptation in Digital Video Streaming

    Get PDF
    User-generated video has attracted a lot of attention due to the success of Video Sharing Sites such as YouTube and Online Social Networks. Recently, a shift towards live consumption of these videos is observable. The content is captured and instantly shared over the Internet using smart mobile devices such as smartphones. Large-scale platforms arise such as YouTube.Live, YouNow or Facebook.Live which enable the smartphones of users to livestream to the public. These platforms achieve the distribution of tens of thousands of low resolution videos to remote viewers in parallel. Nonetheless, the providers are not capable to guarantee an efficient collection and distribution of high-quality video streams. As a result, the user experience is often degraded, and the needed infrastructure installments are huge. Efficient methods are required to cope with the increasing demand for these video streams; and an understanding is needed how to capture, process and distribute the videos to guarantee a high-quality experience for viewers. This thesis addresses the quality awareness of user-generated videos by leveraging the concept of content adaptation. Two types of content adaptation, the adaptive video streaming and the video composition, are discussed in this thesis. Then, a novel approach for the given scenario of a live upload from mobile devices, the processing of video streams and their distribution is presented. This thesis demonstrates that content adaptation applied to each step of this scenario, ranging from the upload to the consumption, can significantly improve the quality for the viewer. At the same time, if content adaptation is planned wisely, the data traffic can be reduced while keeping the quality for the viewers high. The first contribution of this thesis is a better understanding of the perceived quality in user-generated video and its influencing factors. Subjective studies are performed to understand what affects the human perception, leading to the first of their kind quality models. Developed quality models are used for the second contribution of this work: novel quality assessment algorithms. A unique attribute of these algorithms is the usage of multiple features from different sensors. Whereas classical video quality assessment algorithms focus on the visual information, the proposed algorithms reduce the runtime by an order of magnitude when using data from other sensors in video capturing devices. Still, the scalability for quality assessment is limited by executing algorithms on a single server. This is solved with the proposed placement and selection component. It allows the distribution of quality assessment tasks to mobile devices and thus increases the scalability of existing approaches by up to 33.71% when using the resources of only 15 mobile devices. These three contributions are required to provide a real-time understanding of the perceived quality of the video streams produced on mobile devices. The upload of video streams is the fourth contribution of this work. It relies on content and mechanism adaptation. The thesis introduces the first prototypically evaluated adaptive video upload protocol (LiViU) which transcodes multiple video representations in real-time and copes with changing network conditions. In addition, a mechanism adaptation is integrated into LiViU to react to changing application scenarios such as streaming high-quality videos to remote viewers or distributing video with a minimal delay to close-by recipients. A second type of content adaptation is discussed in the fifth contribution of this work. An automatic video composition application is presented which enables live composition from multiple user-generated video streams. The proposed application is the first of its kind, allowing the in-time composition of high-quality video streams by inspecting the quality of individual video streams, recording locations and cinematographic rules. As a last contribution, the content-aware adaptive distribution of video streams to mobile devices is introduced by the Video Adaptation Service (VAS). The VAS analyzes the video content streamed to understand which adaptations are most beneficial for a viewer. It maximizes the perceived quality for each video stream individually and at the same time tries to produce as little data traffic as possible - achieving data traffic reduction of more than 80%
    corecore