96 research outputs found

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestĂŒtzten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal fĂŒr fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestĂŒtzten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollstĂ€ndige volumetrische Information einer Szene. Diese Technologie bietet dabei das grĂ¶ĂŸte Potenzial, immersive Erlebnisse zu mehr RealitĂ€tsnĂ€he zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der InkompatibilitĂ€t derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen fĂŒr 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die KompatibilitĂ€t zwischen Anwendungen und GerĂ€ten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken fĂŒr eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollstĂ€ndige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen AnsĂ€tze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung

    A prediction-based dynamic content adaptation framework for enterprise documents applied to collaborative mobile web conferencing

    Get PDF
    Enterprise documents, created in applications such as PowerPoint and Word, can be used and shared using ubiquitousWeb-enabled terminals connected to the Internet. In the context ofWeb conferencing, enterprise documents, particularly presentation slides, are hosted on the server and presented to the meeting participants synchronously. When mobile devices are involved in such meeting conferencing applications, the content (e.g.: presentation slides) should be adapted to meet the target mobile terminal constraints, but more importantly, to provide the end-user with the best experience possible. Globally, two major trends in content adaptation have been studied: static and dynamic. In static content adaptation, the content is adapted into a set of versions using different transcoding parameter combinations. At runtime, when the content is requested, the optimal of those versions, based on a given quality criterion, is selected for delivery. The performance of these solutions is based on the granularity in use; the number of created versions. In dynamic content adaptation, also called just-in-time adaptation, based on the mobile device context, a customized version is created on-the-fly, while the end-user is still waiting. Dynamically identifying the optimal transcoding parameters, without performing any transcoding operation, is very challenging. In this thesis, we propose a novel dynamic adaptation framework that estimates, without performing transcoding, near-optimal transcoding parameters (format, scaling parameter and quality factor). The output formats considered in this research are JPEG- and XHTML-based Web pages. Firstly, we define a quality of experience measure to quantify the quality of the adapted content as experienced by the end-user. This measure takes into account the visual aspect of the content as well as its transport quality, which is mostly affected by the network conditions. Secondly, we propose a dynamic adaptation framework capable of selecting dynamically and with very little computational complexity, near-optimal adapted content that meets the best compromise between its visual quality and delivery time based on the proposed quality of experience measure. It uses predictors of file size and visual quality of JPEG images subject to changing their scaling parameter and quality factor proposed in recent researches. Our framework is comprised of five adaptation methods with increased quality and complexity. The first one, requiring one transcoding operation, estimates near-optimal adapted content, whereas the other four methods improve its prediction accuracy by allowing the system to perform more than one transcoding operation. The performance of the proposed dynamic framework was tested with a static exhaustive system and a typical dynamic system. Globally, the obtained results were very close to optimality and far better than the typical dynamic system. Besides, we were able to reach optimality on a large number of tested documents. The proposed dynamic framework has been applied to OpenOffice Impress presentations. It is designed to be general, but future work can be carried out to validate its applicability to other enterprise documents types such as Word (text) and Excel (spreadsheet)

    Receiver-Driven Video Adaptation

    Get PDF
    In the span of a single generation, video technology has made an incredible impact on daily life. Modern use cases for video are wildly diverse, including teleconferencing, live streaming, virtual reality, home entertainment, social networking, surveillance, body cameras, cloud gaming, and autonomous driving. As these applications continue to grow more sophisticated and heterogeneous, a single representation of video data can no longer satisfy all receivers. Instead, the initial encoding must be adapted to each receiver's unique needs. Existing adaptation strategies are fundamentally flawed, however, because they discard the video's initial representation and force the content to be re-encoded from scratch. This process is computationally expensive, does not scale well with the number of videos produced, and throws away important information embedded in the initial encoding. Therefore, a compelling need exists for the development of new strategies that can adapt video content without fully re-encoding it. To better support the unique needs of smart receivers, diverse displays, and advanced applications, general-use video systems should produce and offer receivers a more flexible compressed representation that supports top-down adaptation strategies from an original, compressed-domain ground truth. This dissertation proposes an alternate model for video adaptation that addresses these challenges. The key idea is to treat the initial compressed representation of a video as the ground truth, and allow receivers to drive adaptation by dynamically selecting which subsets of the captured data to receive. In support of this model, three strategies for top-down, receiver-driven adaptation are proposed. First, a novel, content-agnostic entropy coding technique is implemented in which symbols are selectively dropped from an input abstract symbol stream based on their estimated probability distributions to hit a target bit rate. Receivers are able to guide the symbol dropping process by supplying the encoder with an appropriate rate controller algorithm that fits their application needs and available bandwidths. Next, a domain-specific adaptation strategy is implemented for H.265/HEVC coded video in which the prediction data from the original source is reused directly in the adapted stream, but the residual data is recomputed as directed by the receiver. By tracking the changes made to the residual, the encoder can compensate for decoder drift to achieve near-optimal rate-distortion performance. Finally, a fully receiver-driven strategy is proposed in which the syntax elements of a pre-coded video are cataloged and exposed directly to clients through an HTTP API. Instead of requesting the entire stream at once, clients identify the exact syntax elements they wish to receive using a carefully designed query language. Although an implementation of this concept is not provided, an initial analysis shows that such a system could save bandwidth and computation when used by certain targeted applications.Doctor of Philosoph

    Image coding using redundant dictionaries

    Get PDF
    This chapter discusses the problem of coding images using very redundant libraries of waveforms, also referred to as dictionaries. We start with a discussion of the shortcomings of classical approaches based on orthonormal bases. More specifically, we show why these redundant dictionaries provide an interesting alternative for image representation. We then introduce a special dictionary of 2-D primitives called anisotropic refinement atoms that are well suited for representing edge dominated images. Using a simple greedy algorithm, we design an image coder that performs very well at low bit rate. We finally discuss its performance and particular features such as geometric adaptativity and rate scalability

    Predicting and Optimizing Image Compression

    Get PDF
    Image compression is a core task for mobile devices, social media and cloud storage backend services. Key evaluation criteria for compression are: the quality of the output, the compression ratio achieved and the computational time (and energy) expended. Predicting the effectiveness of standard compression implementations like libjpeg and WebP on a novel image is challenging, and often leads to non-optimal compression. This paper presents a machine learning-based technique to accurately model the outcome of image compression for arbitrary new images in terms of quality and compression ratio, without requiring significant additional computational time and energy. Using this model, we can actively adapt the aggressiveness of compression on a per image basis to accurately fit user requirements, leading to a more optimal compression.Postprin

    Image and Video Coding/Transcoding: A Rate Distortion Approach

    Get PDF
    Due to the lossy nature of image/video compression and the expensive bandwidth and computation resources in a multimedia system, one of the key design issues for image and video coding/transcoding is to optimize trade-off among distortion, rate, and/or complexity. This thesis studies the application of rate distortion (RD) optimization approaches to image and video coding/transcoding for exploring the best RD performance of a video codec compatible to the newest video coding standard H.264 and for designing computationally efficient down-sampling algorithms with high visual fidelity in the discrete Cosine transform (DCT) domain. RD optimization for video coding in this thesis considers two objectives, i.e., to achieve the best encoding efficiency in terms of minimizing the actual RD cost and to maintain decoding compatibility with the newest video coding standard H.264. By the actual RD cost, we mean a cost based on the final reconstruction error and the entire coding rate. Specifically, an operational RD method is proposed based on a soft decision quantization (SDQ) mechanism, which has its root in a fundamental RD theoretic study on fixed-slope lossy data compression. Using SDQ instead of hard decision quantization, we establish a general framework in which motion prediction, quantization, and entropy coding in a hybrid video coding scheme such as H.264 are jointly designed to minimize the actual RD cost on a frame basis. The proposed framework is applicable to optimize any hybrid video coding scheme, provided that specific algorithms are designed corresponding to coding syntaxes of a given standard codec, so as to maintain compatibility with the standard. Corresponding to the baseline profile syntaxes and the main profile syntaxes of H.264, respectively, we have proposed three RD algorithms---a graph-based algorithm for SDQ given motion prediction and quantization step sizes, an algorithm for residual coding optimization given motion prediction, and an iterative overall algorithm for jointly optimizing motion prediction, quantization, and entropy coding---with them embedded in the indicated order. Among the three algorithms, the SDQ design is the core, which is developed based on a given entropy coding method. Specifically, two SDQ algorithms have been developed based on the context adaptive variable length coding (CAVLC) in H.264 baseline profile and the context adaptive binary arithmetic coding (CABAC) in H.264 main profile, respectively. Experimental results for the H.264 baseline codec optimization show that for a set of typical testing sequences, the proposed RD method for H.264 baseline coding achieves a better trade-off between rate and distortion, i.e., 12\% rate reduction on average at the same distortion (ranging from 30dB to 38dB by PSNR) when compared with the RD optimization method implemented in H.264 baseline reference codec. Experimental results for optimizing H.264 main profile coding with CABAC show 10\% rate reduction over a main profile reference codec using CABAC, which also suggests 20\% rate reduction over the RD optimization method implemented in H.264 baseline reference codec, leading to our claim of having developed the best codec in terms of RD performance, while maintaining the compatibility with H.264. By investigating trade-off between distortion and complexity, we have also proposed a designing framework for image/video transcoding with spatial resolution reduction, i.e., to down-sample compressed images/video with an arbitrary ratio in the DCT domain. First, we derive a set of DCT-domain down-sampling methods, which can be represented by a linear transform with double-sided matrix multiplication (LTDS) in the DCT domain. Then, for a pre-selected pixel-domain down-sampling method, we formulate an optimization problem for finding an LTDS to approximate the given pixel-domain method to achieve the best trade-off between visual quality and computational complexity. The problem is then solved by modeling an LTDS with a multi-layer perceptron network and using a structural learning with forgetting algorithm for training the network. Finally, by selecting a pixel-domain reference method with the popular Butterworth lowpass filtering and cubic B-spline interpolation, the proposed framework discovers an LTDS with better visual quality and lower computational complexity when compared with state-of-the-art methods in the literature

    Bit rate transcoding of H.264/AVC based on rate shaping and requantization

    Get PDF

    Using machine learning to select and optimise multiple objectives in media compression

    Get PDF
    The growing complexity of emerging image and video compression standards means additional demands on computational time and energy resources in a variety of environments. Additionally, the steady increase in sensor resolution, display resolution, and the demand for increasingly high-quality media in consumer and professional applications also mean that there is an increasing quantity of media being compressed. This work focuses on a methodology for improving and understanding the quality of media compression algorithms using an empirical approach. Consequently, the outcomes of this research can be deployed on existing standard compression algorithms, but are also likely to be applicable to future standards without substantial redevelopment, increasing productivity and decreasing time-to-market. Using machine learning techniques, this thesis proposes a means of using past information about how images and videos are compressed in terms of content, and leveraging this information to guide and improve industry standard media compressors in order to achieve the desired outcome in a time and energy e cient way. The methodology is implemented and evaluated on JPEG, WebP and x265 codecs, allowing the system to automatically target multiple performance characteristics like le size, image quality, compression time and e ciency, based on user preferences. Compared to previous work, this system is able to achieve a prediction error three times smaller for quality and size for JPEG, and a speed up of compression of four times for WebP, targeting the same objectives. For x265 video compression, the system allows multiple objectives to be considered simultaneously, allowing speedier encoding for similar levels of quality
    • 

    corecore