11 research outputs found

    No-reference Stereoscopic Image Quality Assessment Using Natural Scene Statistics

    Get PDF
    We present two contributions in this work: (i) a bivariate generalized Gaussian distribution (BGGD) model for the joint distribution of luminance and disparity subband coefficients of natural stereoscopic scenes and (ii) a no-reference (NR) stereo image quality assessment algorithm based on the BGGD model. We first empirically show that a BGGD accurately models the joint distribution of luminance and disparity subband coefficients. We then show that the model parameters form good discriminatory features for NR quality assessment. Additionally, we rely on the previously established result that luminance and disparity subband coefficients of natural stereo scenes are correlated, and show that correlation also forms a good feature for NR quality assessment. These features are computed for both the left and right luminance-disparity pairs in the stereo image and consolidated into one feature vector per stereo pair. This feature set and the stereo pair׳s difference mean opinion score (DMOS) (labels) are used for supervised learning with a support vector machine (SVM). Support vector regression is used to estimate the perceptual quality of a test stereo image pair. The performance of the algorithm is evaluated over popular databases and shown to be competitive with the state-of-the-art no-reference quality assessment algorithms. Further, the strength of the proposed algorithm is demonstrated by its consistently good performance over both symmetric and asymmetric distortion types. Our algorithm is called Stereo QUality Evaluator (StereoQUE)

    No-Reference Quality Assessment for 360-degree Images by Analysis of Multi-frequency Information and Local-global Naturalness

    Full text link
    360-degree/omnidirectional images (OIs) have achieved remarkable attentions due to the increasing applications of virtual reality (VR). Compared to conventional 2D images, OIs can provide more immersive experience to consumers, benefitting from the higher resolution and plentiful field of views (FoVs). Moreover, observing OIs is usually in the head mounted display (HMD) without references. Therefore, an efficient blind quality assessment method, which is specifically designed for 360-degree images, is urgently desired. In this paper, motivated by the characteristics of the human visual system (HVS) and the viewing process of VR visual contents, we propose a novel and effective no-reference omnidirectional image quality assessment (NR OIQA) algorithm by Multi-Frequency Information and Local-Global Naturalness (MFILGN). Specifically, inspired by the frequency-dependent property of visual cortex, we first decompose the projected equirectangular projection (ERP) maps into wavelet subbands. Then, the entropy intensities of low and high frequency subbands are exploited to measure the multi-frequency information of OIs. Besides, except for considering the global naturalness of ERP maps, owing to the browsed FoVs, we extract the natural scene statistics features from each viewport image as the measure of local naturalness. With the proposed multi-frequency information measurement and local-global naturalness measurement, we utilize support vector regression as the final image quality regressor to train the quality evaluation model from visual quality-related features to human ratings. To our knowledge, the proposed model is the first no-reference quality assessment method for 360-degreee images that combines multi-frequency information and image naturalness. Experimental results on two publicly available OIQA databases demonstrate that our proposed MFILGN outperforms state-of-the-art approaches

    Perceptual Quality-of-Experience of Stereoscopic 3D Images and Videos

    Get PDF
    With the fast development of 3D acquisition, communication, processing and display technologies, automatic quality assessment of 3D images and videos has become ever important. Nevertheless, recent progress on 3D image quality assessment (IQA) and video quality assessment (VQA) remains limited. The purpose of this research is to investigate various aspects of human visual quality-of-experience (QoE) when viewing stereoscopic 3D images/videos and to develop objective quality assessment models that automatically predict visual QoE of 3D images/videos. Firstly, we create a new subjective 3D-IQA database that has two features that are lacking in the literature, i.e., the inclusion of both 2D and 3D images, and the inclusion of mixed distortion types. We observe strong distortion type dependent bias when using the direct average of 2D image quality to predict 3D image quality. We propose a binocular rivalry inspired multi-scale model to predict the quality of stereoscopic images and the results show that the proposed model eliminates the prediction bias, leading to significantly improved quality predictions. Second, we carry out two subjective studies on depth perception of stereoscopic 3D images. The first one follows a traditional framework where subjects are asked to rate depth quality directly on distorted stereopairs. The second one uses a novel approach, where the stimuli are synthesized independent of the background image content and the subjects are asked to identify depth changes and label the polarities of depth. Our analysis shows that the second approach is much more effective at singling out the contributions of stereo cues in depth perception. We initialize the notion of depth perception difficulty index (DPDI) and propose a novel computational model for DPDI prediction. The results show that the proposed model leads to highly promising DPDI prediction performance. Thirdly, we carry out subjective 3D-VQA experiments on two databases that contain various asymmetrically compressed stereoscopic 3D videos. We then compare different mixed-distortions asymmetric stereoscopic video coding schemes with symmetric coding methods and verify their potential coding gains. We propose a model to account for the prediction bias from using direct averaging of 2D video quality to predict 3D video quality. The results show that the proposed model leads to significantly improved quality predictions and can help us predict the coding gain of mixed-distortions asymmetric video compression. Fourthly, we investigate the problem of objective quality assessment of Multi-view-plus-depth (MVD) images, with a main focus on the pre- depth-image-based-rendering (pre-DIBR) case. We find that existing IQA methods are difficult to be employed as a guiding criterion in the optimization of MVD video coding and transmission systems when applied post-DIBR. We propose a novel pre-DIBR method based on information content weighting of both texture and depth images, which demonstrates competitive performance against state-of-the-art IQA models applied post-DIBR

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio

    Visibility and distortion measurement for no-reference dehazed image quality assessment via complex contourlet transform

    Get PDF
    Recently, most dehazed image quality assessment (DQA) methods mainly focus on the estimation of remaining haze, omitting the impact of distortions from the side effect of dehazing algorithms, which lead to their limited performance. Addressing this problem, we proposed a learning both Visibility and Distortion Aware features no-reference (NR) Dehazed image Quality Assessment method (VDA-DQA). Visibility aware features are exploited to characterize clarity optimization after dehazing, including the brightness, contrast, and sharpness aware feature extracted by complex contourlet transform (CCT). Then, distortion aware features are employed to measure the distortion artifacts of images, including the normalized histogram of local binary pattern (LBP) from the reconstructed dehazed image and the statistics of the CCT sub-bands corresponding to chroma and saturation map. Finally, all the above features are mapped into the quality scores by the support vector regression (SVR). Extensive experimental results on six public DQA datasets verify the superiority of proposed VDA-DQA in terms of the consistency with subjective visual perception, and outperforms the state-of-the-art methods.The source code of VDA-DQA is available at https://github.com/li181119/VDA-DQA

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersâ QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsâ ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersâ preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency

    Deep learning based objective quality assessment of multidimensional visual content

    Get PDF
    Tese (doutorado) — Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2022.Na última década, houve um tremendo aumento na popularidade dos aplicativos multimídia, aumentando assim o conteúdo multimídia. Quando esses conteúdossão gerados, transmitidos, reconstruídos e compartilhados, seus valores de pixel originais são transformados. Nesse cenário, torna-se mais crucial e exigente avaliar a qualidade visual do conteúdo visual afetado para que os requisitos dos usuários finais sejam atendidos. Neste trabalho, investigamos recursos espaciais, temporais e angulares eficazes desenvolvendo algoritmos sem referência que avaliam a qualidade visual de conteúdo visual multidimensional distorcido. Usamos algoritmos de aprendizado de máquina e aprendizado profundo para obter precisão de previsão.Para avaliação de qualidade de imagem bidimensional (2D), usamos padrões binários locais multiescala e informações de saliência e treinamos/testamos esses recursos usando o Random Forest Regressor. Para avaliação de qualidade de vídeo 2D, apresentamos um novo conceito de saliência espacial e temporal e pontuações de qualidade objetivas personalizadas. Usamos um modelo leve baseado em Rede Neural Convolucional (CNN) para treinamento e teste em patches selecionados de quadros de vídeo.Para avaliação objetiva da qualidade de imagens de campo de luz (LFI) em quatro dimensões (4D), propomos sete métodos de avaliação de qualidade LFI (LF-IQA) no total. Considerando que o LFI é composto por multi-views densas, Inspired by Human Visual System (HVS), propomos nosso primeiro método LF-IQA que é baseado em uma arquitetura CNN de dois fluxos. O segundo e terceiro métodos LF-IQA também são baseados em uma arquitetura de dois fluxos, que incorpora CNN, Long Short-Term Memory (LSTM) e diversos recursos de gargalo. O quarto LF-IQA é baseado nas camadas CNN e Atrous Convolution (ACL), enquanto o quinto método usa as camadas CNN, ACL e LSTM. O sexto método LF-IQA também é baseado em uma arquitetura de dois fluxos, na qual EPIs horizontais e verticais são processados no domínio da frequência. Por último, mas não menos importante, o sétimo método LF-IQA é baseado em uma Rede Neural Convolucional de Gráfico. Para todos os métodos mencionados acima, realizamos experimentos intensivos e os resultados mostram que esses métodos superaram os métodos de última geração em conjuntos de dados de qualidade populares.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).In the last decade, there has been a tremendous increase in the popularity of multimedia applications, hence increasing multimedia content. When these contents are generated, transmitted, reconstructed and shared, their original pixel values are transformed. In this scenario, it becomes more crucial and demanding to assess visual quality of the affected visual content so that the requirements of end-users are satisfied. In this work, we investigate effective spatial, temporal, and angular features by developing no-reference algorithms that assess the visual quality of distorted multi-dimensional visual content. We use machine learning and deep learning algorithms to obtain prediction accuracy. For two-dimensional (2D) image quality assessment, we use multiscale local binary patterns and saliency information, and train / test these features using Random Forest Regressor. For 2D video quality assessment, we introduce a novel concept of spatial and temporal saliency and custom objective quality scores. We use a Convolutional Neural Network (CNN) based light-weight model for training and testing on selected patches of video frames. For objective quality assessment of four-dimensional (4D) light field images (LFI), we propose seven LFI quality assessment (LF-IQA) methods in total. Considering that LFI is composed of dense multi-views, Inspired by Human Visual System (HVS), we propose our first LF-IQA method that is based on a two-streams CNN architecture. The second and third LF-IQA methods are also based on a two-stream architecture, which incorporates CNN, Long Short-Term Memory (LSTM), and diverse bottleneck features. The fourth LF-IQA is based on CNN and Atrous Convolution layers (ACL), while the fifth method uses CNN, ACL, and LSTM layers. The sixth LF-IQA method is also based on a two-stream architecture, in which, horizontal and vertical EPIs are processed in the frequency domain. Last, but not least, the seventh LF-IQA method is based on a Graph Convolutional Neural Network. For all of the methods mentioned above, we performed intensive experiments, and the results show that these methods outperformed state-of-the-art methods on popular quality datasets
    corecore