9 research outputs found

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersâ QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsâ ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersâ preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency

    Low-Complexity Deep Learning-Based Light Field Image Quality Assessment

    Get PDF
    Light field image quality assessment (LF-IQA) has attracted increasing research interests due to the fast-growing demands for immersive media experience. The majority of existing LF-IQA metrics, however, heavily rely on high-complexity statistics-based feature extraction for the quality assessment task, which will be hardly sustainable in real-time applications or power-constrained consumer electronic devices in future real-life applications. In this research, a low-complexity Deep learning-based Light Field Image Quality Evaluator (DeLFIQE) is proposed to automatically and efficiently extract features for LF-IQA. To the best of my knowledge, this is the first attempt in LF-IQA with a dedicatedly designed convolutional neural network (CNN) based deep learning model. First, to significantly accelerate the training process, discriminative Epipolar Plane Image (EPI) patches, instead of the full light field images (LFIs) or full EPIs, are obtained and used as input for training and testing in DeLFIQE. By utilizing the EPI patches as input, the quality evaluation of 4-D LFIs is converted to the evaluation of 2-D EPI patches, thus significantly reducing the computational complexity. Furthermore, discriminative EPI patches are selected in such a way that they contain most of the distortion information, thus further improving the training efficiency. Second, to improve the quality assessment accuracy and robustness, a multi-task learning mechanism is designed and employed in DeLFIQE. Specifically, alongside the main task that predicts the final quality score, an auxiliary classification task is designed to classify LFIs based on their distortion types and severity levels. That way, the features are extracted to reflect the distortion types and severity levels, which in turn helps the main task improve the accuracy and robustness of the prediction. The extensive experiments show that DeLFIQE outperforms state-of-the-art metrics from both accuracy and correlation perspectives, especially on benchmark LF datasets of high angular resolutions. When tested on the LF datasets of low angular resolutions, however, the performance of DeLFIQE slightly declines, although still remains competitive. It is believed that it is due to the fact that the distortion feature information contained in the EPI patches gets reduced with the decrease of the LFIs’ angular resolutions, thus reducing the training efficiency and the overall performance of DeLFIQE. Therefore, a General-purpose deep learning-based Light Field Image Quality Evaluator (GeLFIQE) is proposed to perform accurately and efficiently on LF datasets of both high and low angular resolutions. First, a deep CNN model is pre-trained on one of the most comprehensive benchmark LF datasets of high angular resolutions containing abundant distortion features. Next, the features learned from the pre-trained model are transferred to the target LF dataset-specific CNN model to help improve the generalisation and overall performance on low-resolution LFIs containing fewer distortion features. The experimental results show that GeLFIQE substantially improves the performance of DeLFIQE on low-resolution LF datasets, which makes it a real general-purpose LF-IQA metric for LF datasets of various resolutions

    Compression and visual quality assessment for light field contents

    Get PDF
    Since its invention in the 19th century, photography has allowed to create durable images of the world around us by capturing the intensity of light that flows through a scene, first analogically by using light-sensitive material, and then, with the advent of electronic image sensors, digitally. However, one main limitation of both analog and digital photography lays in its inability to capture any information about the direction of light rays. Through traditional photography, each three-dimensional scene is projected onto a 2D plane; consequently, no information about the position of the 3D objects in space is retained. Light field photography aims at overcoming these limitations by recording the direction of light along with its intensity. In the past, several acquisition technologies have been presented to properly capture light field information, and portable devices have been commercialized to the general public. However, a considerably larger volume of data is generated when compared to traditional photography. Thus, new solutions must be designed to face the challenges light field photography poses in terms of storage, representation, and visualization of the acquired data. In particular, new and efficient compression algorithms are needed to sensibly reduce the amount of data that needs to be stored and transmitted, while maintaining an adequate level of perceptual quality. In designing new solutions to address the unique challenges posed by light field photography, one cannot forgo the importance of having reliable, reproducible means of evaluating their performance, especially in relation to the scenario in which they will be consumed. To that end, subjective assessment of visual quality is of paramount importance to evaluate the impact of compression, representation, and rendering models on user experience. Yet, the standardized methodologies that are commonly used to evaluate the visual quality of traditional media content, such as images and videos, are not equipped to tackle the challenges posed by light field photography. New subjective methodologies must be tailored for the new possibilities this new type of imaging offers in terms of rendering and visual experience. In this work, we address the aforementioned problems by both designing new methodologies for visual quality evaluation of light field contents, and outlining a new compression solution to efficiently reduce the amount of data that needs to be transmitted and stored. We first analyse how traditional methodologies for subjective evaluation of multimedia contents can be adapted to suit light field data, and, we propose new methodologies to reliably assess the visual quality while maintaining user engagement. Furthermore, we study how user behavior is affected by the visual quality of the data. We employ subjective quality assessment to compare several state-of-the-art solutions in light field coding, in order to find the most promising approaches to minimize the volume of data without compromising on the perceptual quality. To that means, we define and inspect several coding approaches for light field compression, and we investigate the impact of color subsampling on the final rendered content. Lastly, we propose a new coding approach to perform light field compression, showing significant improvement with respect to the state of the art

    Novel Methods and Algorithms for Presenting 3D Scenes

    Get PDF
    In recent years, improvements in the acquisition and creation of 3D models gave rise to an increasing availability of 3D content and to a widening of the audience such content is created for, which brought into focus the need for effective ways to visualize and interact with it. Until recently, the task of virtual inspection of a 3D object or navigation inside a 3D scene was carried out by using human machine interaction (HMI) metaphors controlled through mouse and keyboard events. However, this interaction approach may be cumbersome for the general audience. Furthermore, the inception and spread of touch-based mobile devices, such as smartphones and tablets, redefined the interaction problem entirely, since neither mouse nor keyboards are available anymore. The problem is made even worse by the fact that these devices are typically lower power if compared to desktop machines, while high-quality rendering is a computationally intensive task. In this thesis, we present a series of novel methods for the easy presentation of 3D content both when it is already available in a digitized form and when it must be acquired from the real world by image-based techniques. In the first case, we propose a method which takes as input the 3D scene of interest and an example video, and it automatically produces a video of the input scene that resembles the given video example. In other words, our algorithm allows the user to replicate an existing video, for example, a video created by a professional animator, on a different 3D scene. In the context of image-based techniques, exploiting the inherent spatial organization of photographs taken for the 3D reconstruction of a scene, we propose an intuitive interface for the smooth stereoscopic navigation of the acquired scene providing an immersive experience without the need of a complete 3D reconstruction. Finally, we propose an interactive framework for improving low-quality 3D reconstructions obtained through image-based reconstruction algorithms. Using few strokes on the input images, the user can specify high-level geometric hints to improve incomplete or noisy reconstructions which are caused by various quite common conditions often arising for objects such as buildings, streets and numerous other human-made functional elements

    A benchmark of DIBR Synthesized View Quality Assessment Metrics on a new database for Immersive Media Applications

    No full text
    International audienceDepth-Image-Based Rendering (DIBR) is a fundamental technology in several 3D-related applications, such as Free viewpoint video (FVV), Virtual Reality (VR) and Augmented Reality (AR). However, new challenges have also been brought in assessing the quality of DIBR-synthesized views since this process induces some new types of distortions, which are inherently different from the distortion caused by video coding. In this paper, we present a new DIBR-synthesized image database with the associated subjective scores. We also test the performances of the state-of-the-art objective quality metrics on this database. This work focuses on the distortions only induced by different DIBR synthesis methods. Seven state-of-the-art DIBR algorithms, including interview synthesis and single view based synthesis methods, are considered in this database. The quality of synthesized views was assessed subjectively by 41 observers and objectively using 14 state-of-the-art objective metrics. Subjective test results show that the interview synthesis methods, having more input information, significantly outperform the single view based ones. Correlation results between the tested objective metrics and the subjective scores on this database reveal that further studies are still needed for a better objective quality metric dedicated to the DIBR-synthesized views
    corecore