6 research outputs found
A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain
Most of the existing 3D video quality assessment (3D-VQA/SVQA) methods only consider spatial information by directly using an image quality evaluation method. In addition, a few take the motion information of adjacent frames into consideration. In practice, one may assume that a single data-view is unlikely to be sufficient for effectively learning the video quality. Therefore, integration of multi-view information is both valuable and necessary. In this paper, we propose an effective multi-view feature learning metric for blind stereoscopic video quality assessment (BSVQA), which jointly focuses on spatial information, temporal information and inter-frame spatio-temporal information. In our study, a set of local binary patterns (LBP) statistical features extracted from a computed frame curvelet representation are used as spatial and spatio-temporal description, and the local flow statistical features based on the estimation of optical flow are used to describe the temporal distortion. Subsequently, a support vector regression (SVR) is utilized to map the feature vectors of each single view to subjective quality scores. Finally, the scores of multiple views are pooled into the final score according to their contribution rate. Experimental results demonstrate that the proposed metric significantly outperforms the existing metrics and can achieve higher consistency with subjective quality assessment
Stereoscopic video quality assessment based on 3D convolutional neural networks
The research of stereoscopic video quality assessment (SVQA) plays an important role for promoting the development of stereoscopic video system. Existing SVQA metrics rely on hand-crafted features, which is inaccurate and time-consuming because of the diversity and complexity of stereoscopic video distortion. This paper introduces a 3D convolutional neural networks (CNN) based SVQA framework that can model not only local spatio-temporal information but also global temporal information with cubic difference video patches as input. First, instead of using hand-crafted features, we design a 3D CNN architecture to automatically and effectively capture local spatio-temporal features. Then we employ a quality score fusion strategy considering global temporal clues to obtain final video-level predicted score. Extensive experiments conducted on two public stereoscopic video quality datasets show that the proposed method correlates highly with human perception and outperforms state-of-the-art methods by a large margin. We also show that our 3D CNN features have more desirable property for SVQA than hand-crafted features in previous methods, and our 3D CNN features together with support vector regression (SVR) can further boost the performance. In addition, with no complex preprocessing and GPU acceleration, our proposed method is demonstrated computationally efficient and easy to use
No-reference depth map quality evaluation model based on depth map edge confidence measurement in immersive video applications
When it comes to evaluating perceptual quality of digital media for overall quality of
experience assessment in immersive video applications, typically two main approaches stand out:
Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the
best representation of perceived video quality assessed by the real viewers. On the other hand, it
consumes a significant amount of time and effort, due to the involvement of real users with lengthy
and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model
is developed. The speed-up advantage offered by an objective quality evaluation model, which can
predict the quality of rendered virtual views based on the depth maps used in the rendering process,
allows for faster quality assessments for immersive video applications. This is particularly
important given the lack of a suitable reference or ground truth for comparing the available depth
maps, especially when live content services are offered in those applications. This paper presents a
no-reference depth map quality evaluation model based on a proposed depth map edge confidence
measurement technique to assist with accurately estimating the quality of rendered (virtual) views
in immersive multi-view video content. The model is applied for depth image-based rendering in
multi-view video format, providing comparable evaluation results to those existing in the literature,
and often exceeding their performance
Deep learning based objective quality assessment of multidimensional visual content
Tese (doutorado) — Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2022.Na última década, houve um tremendo aumento na popularidade dos aplicativos multimídia, aumentando assim o conteúdo multimídia. Quando esses conteúdossão gerados, transmitidos, reconstruídos e compartilhados, seus valores de pixel originais são transformados. Nesse cenário, torna-se mais crucial e exigente avaliar a qualidade visual do conteúdo visual afetado para que os requisitos dos usuários finais sejam atendidos. Neste trabalho, investigamos recursos espaciais, temporais e angulares eficazes desenvolvendo algoritmos sem referência que avaliam a qualidade visual de conteúdo visual multidimensional distorcido. Usamos algoritmos de aprendizado de máquina e aprendizado profundo para obter precisão de previsão.Para avaliação de qualidade de imagem bidimensional (2D), usamos padrões binários locais multiescala e informações de saliência e treinamos/testamos esses recursos usando o Random Forest Regressor. Para avaliação de qualidade de vídeo 2D, apresentamos um novo conceito de saliência espacial e temporal e pontuações de qualidade objetivas personalizadas. Usamos um modelo leve baseado em Rede Neural Convolucional (CNN) para treinamento e teste em patches selecionados de quadros de vídeo.Para avaliação objetiva da qualidade de imagens de campo de luz (LFI) em quatro dimensões (4D), propomos sete métodos de avaliação de qualidade LFI (LF-IQA) no total. Considerando que o LFI é composto por multi-views densas, Inspired by Human Visual System (HVS), propomos nosso primeiro método LF-IQA que é baseado em uma arquitetura CNN de dois fluxos. O segundo e terceiro métodos LF-IQA também são baseados em uma arquitetura de dois fluxos, que incorpora CNN, Long Short-Term Memory (LSTM) e diversos recursos de gargalo. O quarto LF-IQA é baseado nas camadas CNN e Atrous Convolution (ACL), enquanto o quinto método usa as camadas CNN, ACL e LSTM. O sexto método LF-IQA também é baseado em uma arquitetura de dois fluxos, na qual EPIs horizontais e verticais são processados no domínio da frequência. Por último, mas não menos importante, o sétimo método LF-IQA é baseado em uma Rede Neural Convolucional de Gráfico. Para todos os métodos mencionados acima, realizamos experimentos intensivos e os resultados mostram que esses métodos superaram os métodos de última geração em conjuntos de dados de qualidade populares.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).In the last decade, there has been a tremendous increase in the popularity of multimedia applications, hence increasing multimedia content. When these contents are generated,
transmitted, reconstructed and shared, their original pixel values are transformed. In this
scenario, it becomes more crucial and demanding to assess visual quality of the affected
visual content so that the requirements of end-users are satisfied. In this work, we investigate effective spatial, temporal, and angular features by developing no-reference algorithms
that assess the visual quality of distorted multi-dimensional visual content. We use machine
learning and deep learning algorithms to obtain prediction accuracy.
For two-dimensional (2D) image quality assessment, we use multiscale local binary patterns and saliency information, and train / test these features using Random Forest Regressor. For 2D video quality assessment, we introduce a novel concept of spatial and temporal
saliency and custom objective quality scores. We use a Convolutional Neural Network (CNN)
based light-weight model for training and testing on selected patches of video frames.
For objective quality assessment of four-dimensional (4D) light field images (LFI), we
propose seven LFI quality assessment (LF-IQA) methods in total. Considering that LFI is
composed of dense multi-views, Inspired by Human Visual System (HVS), we propose our
first LF-IQA method that is based on a two-streams CNN architecture. The second and third
LF-IQA methods are also based on a two-stream architecture, which incorporates CNN, Long
Short-Term Memory (LSTM), and diverse bottleneck features. The fourth LF-IQA is based
on CNN and Atrous Convolution layers (ACL), while the fifth method uses CNN, ACL, and
LSTM layers. The sixth LF-IQA method is also based on a two-stream architecture, in which,
horizontal and vertical EPIs are processed in the frequency domain. Last, but not least, the
seventh LF-IQA method is based on a Graph Convolutional Neural Network. For all of the
methods mentioned above, we performed intensive experiments, and the results show that
these methods outperformed state-of-the-art methods on popular quality datasets