38 research outputs found

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestützten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal für fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestützten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollständige volumetrische Information einer Szene. Diese Technologie bietet dabei das größte Potenzial, immersive Erlebnisse zu mehr Realitätsnähe zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der Inkompatibilität derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen für 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die Kompatibilität zwischen Anwendungen und Geräten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken für eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollständige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen Ansätze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung

    Interactive Video Coding

    Get PDF
    Projecte realitzat en el marc d’un programa de mobilitat amb la Linköpings Universitet[ANGLÈS] One of the main challenges when transcoding video is to find a proper balance between good quality and a manageable final size of the video file. A possible solution is to perform a trial and error procedure, transcoding the video several times until a good result is found. Besides being a tedious work it does not ensure finding an optimum solution. In this project a full system is developed to solve this problem. The solution is based on obtaining a complexity estimation of the video and to perform several short transcodifications at the most complex parts, testing different coding parameters. In addition, the user is able to handle all the process by selecting which parts to transcode and which parameters to test as many times as desired after watching the system results.[CASTELLÀ] En el proceso de transcodificación de videos, uno de los aspectos mas importantes a tener en cuenta es el compromiso entre buena calidad de video y tamaño reducido del fichero. Para encontrar una buena solución se puede intentar transcodificar el video varias veces hasta llegar a un resultado satisfactorio. Esto, a parte de ser un trabajo tedioso, no asegura la obtención de una solución óptima. En éste proyecto se ha desarrollado un sistema que soluciona el problema basándose en probar diferentes parámetros de codificación en las partes más complejas del video, tras hacer una estimación de complejidad. Además, el usuario es quién decide cuándo terminar el proceso a partir de los resultados obtenidos, pudiendo realizar tantas transcodificaciones como crea necesario y probando los parámetros de codificación que considere convenientes.[CATALÀ] A l’hora de transcodificar videos, una de les consideracions més importants és el compromís entre una bona qualitat de video i una mida de fitxer reduïda. Una possible solució és transcodificar el video diverses vegades fins a arribar a un bon resultat. Això, a banda de ser un treball molt feixuc, no assegura que s’obtingui un resultat òptim. En aquest projecte s’ha desenvolupat un sistema que soluciona el problema basant-se en trobar les parts més complexes d’un vídeo a partir d’una estimació de complexitat i aplicar allà diverses transcodificacions amb diferents paràmetres de codificació. A més, és l’usuari qui decideix com es desenvolupa el procés, podent demanar tantes transcodificacions com es cregui necessari per a provar els diferents paràmetres, després d’observar els resultats obtinguts

    Predicting and Optimizing Image Compression

    Get PDF
    Image compression is a core task for mobile devices, social media and cloud storage backend services. Key evaluation criteria for compression are: the quality of the output, the compression ratio achieved and the computational time (and energy) expended. Predicting the effectiveness of standard compression implementations like libjpeg and WebP on a novel image is challenging, and often leads to non-optimal compression. This paper presents a machine learning-based technique to accurately model the outcome of image compression for arbitrary new images in terms of quality and compression ratio, without requiring significant additional computational time and energy. Using this model, we can actively adapt the aggressiveness of compression on a per image basis to accurately fit user requirements, leading to a more optimal compression.Postprin

    A prediction-based dynamic content adaptation framework for enterprise documents applied to collaborative mobile web conferencing

    Get PDF
    Enterprise documents, created in applications such as PowerPoint and Word, can be used and shared using ubiquitousWeb-enabled terminals connected to the Internet. In the context ofWeb conferencing, enterprise documents, particularly presentation slides, are hosted on the server and presented to the meeting participants synchronously. When mobile devices are involved in such meeting conferencing applications, the content (e.g.: presentation slides) should be adapted to meet the target mobile terminal constraints, but more importantly, to provide the end-user with the best experience possible. Globally, two major trends in content adaptation have been studied: static and dynamic. In static content adaptation, the content is adapted into a set of versions using different transcoding parameter combinations. At runtime, when the content is requested, the optimal of those versions, based on a given quality criterion, is selected for delivery. The performance of these solutions is based on the granularity in use; the number of created versions. In dynamic content adaptation, also called just-in-time adaptation, based on the mobile device context, a customized version is created on-the-fly, while the end-user is still waiting. Dynamically identifying the optimal transcoding parameters, without performing any transcoding operation, is very challenging. In this thesis, we propose a novel dynamic adaptation framework that estimates, without performing transcoding, near-optimal transcoding parameters (format, scaling parameter and quality factor). The output formats considered in this research are JPEG- and XHTML-based Web pages. Firstly, we define a quality of experience measure to quantify the quality of the adapted content as experienced by the end-user. This measure takes into account the visual aspect of the content as well as its transport quality, which is mostly affected by the network conditions. Secondly, we propose a dynamic adaptation framework capable of selecting dynamically and with very little computational complexity, near-optimal adapted content that meets the best compromise between its visual quality and delivery time based on the proposed quality of experience measure. It uses predictors of file size and visual quality of JPEG images subject to changing their scaling parameter and quality factor proposed in recent researches. Our framework is comprised of five adaptation methods with increased quality and complexity. The first one, requiring one transcoding operation, estimates near-optimal adapted content, whereas the other four methods improve its prediction accuracy by allowing the system to perform more than one transcoding operation. The performance of the proposed dynamic framework was tested with a static exhaustive system and a typical dynamic system. Globally, the obtained results were very close to optimality and far better than the typical dynamic system. Besides, we were able to reach optimality on a large number of tested documents. The proposed dynamic framework has been applied to OpenOffice Impress presentations. It is designed to be general, but future work can be carried out to validate its applicability to other enterprise documents types such as Word (text) and Excel (spreadsheet)

    Receiver-Driven Video Adaptation

    Get PDF
    In the span of a single generation, video technology has made an incredible impact on daily life. Modern use cases for video are wildly diverse, including teleconferencing, live streaming, virtual reality, home entertainment, social networking, surveillance, body cameras, cloud gaming, and autonomous driving. As these applications continue to grow more sophisticated and heterogeneous, a single representation of video data can no longer satisfy all receivers. Instead, the initial encoding must be adapted to each receiver's unique needs. Existing adaptation strategies are fundamentally flawed, however, because they discard the video's initial representation and force the content to be re-encoded from scratch. This process is computationally expensive, does not scale well with the number of videos produced, and throws away important information embedded in the initial encoding. Therefore, a compelling need exists for the development of new strategies that can adapt video content without fully re-encoding it. To better support the unique needs of smart receivers, diverse displays, and advanced applications, general-use video systems should produce and offer receivers a more flexible compressed representation that supports top-down adaptation strategies from an original, compressed-domain ground truth. This dissertation proposes an alternate model for video adaptation that addresses these challenges. The key idea is to treat the initial compressed representation of a video as the ground truth, and allow receivers to drive adaptation by dynamically selecting which subsets of the captured data to receive. In support of this model, three strategies for top-down, receiver-driven adaptation are proposed. First, a novel, content-agnostic entropy coding technique is implemented in which symbols are selectively dropped from an input abstract symbol stream based on their estimated probability distributions to hit a target bit rate. Receivers are able to guide the symbol dropping process by supplying the encoder with an appropriate rate controller algorithm that fits their application needs and available bandwidths. Next, a domain-specific adaptation strategy is implemented for H.265/HEVC coded video in which the prediction data from the original source is reused directly in the adapted stream, but the residual data is recomputed as directed by the receiver. By tracking the changes made to the residual, the encoder can compensate for decoder drift to achieve near-optimal rate-distortion performance. Finally, a fully receiver-driven strategy is proposed in which the syntax elements of a pre-coded video are cataloged and exposed directly to clients through an HTTP API. Instead of requesting the entire stream at once, clients identify the exact syntax elements they wish to receive using a carefully designed query language. Although an implementation of this concept is not provided, an initial analysis shows that such a system could save bandwidth and computation when used by certain targeted applications.Doctor of Philosoph

    Toward a General Parametric Model for Assessing the Impact of Video Transcoding on Objective Video Quality

    Get PDF
    Video transcoding can cause degradation to an original video. Currently, there is no general model that assesses the impact of video transcoding on video quality. Such a model could play a critical role in evaluating the quality of the transcoded video, and thereby optimizing delivery of video to end-users while meeting their expectations. The main contribution of this research is the development and substantiation of a general parametric model, called the Video Transcoding Objective-quality Model (VTOM), that provides an extensible video transcoding service selection mechanism, which takes into account both the format and characteristics of the original video and the desired output, i.e., viewing format with preferred quality of service. VTOM represents a mathematical function that uses a set of media-related parameters for the original video and desired output, including codec, bit rate, frame rate, and frame size to predict the quality of the transcoded video generated from a specific transcoding. VTOM includes four quality sub-models, each describing the impact of each of these parameters on objective video quality, as well as a weighted-product aggregation function that combines these quality sub-models with four additional error sub-models in a single function for assessing the overall video quality. I compared the predicted quality results generated from the VTOM with quality values generated from an existing objective-quality metric. These comparisons yielded results that showed good correlations, with low error values. VTOM helps the researchers and developers of video delivery systems and applications to calculate the degradation that video transcoding can cause on the fly, rather than evaluate it statistically using statistical methods that only consider the desired output. Because VTOM takes into account the quality of the input video, i.e., video format and characteristics, and the desired quality of the output video, it can be used for dynamic video transcoding service selection and composition. A number of quality metrics were examined and used in development of VTOM and its assessment. However, this research discovered that, to date, there are no suitable metrics in the literature for comparing two videos with different frame rates. Therefore, this dissertation defines a new metric, called Frame Rate Metric (FRM) as part of its contributions. FRM can use any frame-based quality metric for comparing frames from both videos. Finally, this research presents and adapts four Quality of Service (QoS)-aware video transcoding service selection algorithms. The experimental results showed that these four algorithms achieved good results in terms of time complexity, success ratio, and user satisfaction rate

    A Framework for pervasive web content delivery

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Using machine learning to select and optimise multiple objectives in media compression

    Get PDF
    The growing complexity of emerging image and video compression standards means additional demands on computational time and energy resources in a variety of environments. Additionally, the steady increase in sensor resolution, display resolution, and the demand for increasingly high-quality media in consumer and professional applications also mean that there is an increasing quantity of media being compressed. This work focuses on a methodology for improving and understanding the quality of media compression algorithms using an empirical approach. Consequently, the outcomes of this research can be deployed on existing standard compression algorithms, but are also likely to be applicable to future standards without substantial redevelopment, increasing productivity and decreasing time-to-market. Using machine learning techniques, this thesis proposes a means of using past information about how images and videos are compressed in terms of content, and leveraging this information to guide and improve industry standard media compressors in order to achieve the desired outcome in a time and energy e cient way. The methodology is implemented and evaluated on JPEG, WebP and x265 codecs, allowing the system to automatically target multiple performance characteristics like le size, image quality, compression time and e ciency, based on user preferences. Compared to previous work, this system is able to achieve a prediction error three times smaller for quality and size for JPEG, and a speed up of compression of four times for WebP, targeting the same objectives. For x265 video compression, the system allows multiple objectives to be considered simultaneously, allowing speedier encoding for similar levels of quality
    corecore