18,560 research outputs found
Perceptual Quality Assessment of Omnidirectional Audio-visual Signals
Omnidirectional videos (ODVs) play an increasingly important role in the
application fields of medical, education, advertising, tourism, etc. Assessing
the quality of ODVs is significant for service-providers to improve the user's
Quality of Experience (QoE). However, most existing quality assessment studies
for ODVs only focus on the visual distortions of videos, while ignoring that
the overall QoE also depends on the accompanying audio signals. In this paper,
we first establish a large-scale audio-visual quality assessment dataset for
omnidirectional videos, which includes 375 distorted omnidirectional
audio-visual (A/V) sequences generated from 15 high-quality pristine
omnidirectional A/V contents, and the corresponding perceptual audio-visual
quality scores. Then, we design three baseline methods for full-reference
omnidirectional audio-visual quality assessment (OAVQA), which combine existing
state-of-the-art single-mode audio and video QA models via multimodal fusion
strategies. We validate the effectiveness of the A/V multimodal fusion method
for OAVQA on our dataset, which provides a new benchmark for omnidirectional
QoE evaluation. Our dataset is available at https://github.com/iamazxl/OAVQA.Comment: 12 pages, 5 figures, to be published in CICAI202
On Designing Deep Learning Approaches for Classification of Football Jersey Images in the Wild
Internet shopping has spread wide and into social networking. Someone may want to buy a shirt, accessories, etc., in a random picture or a streaming video. In this thesis, the problem of automatic classification was taken upon, constraining the target to jerseys in the wild, assuming the object is detected.;A dataset of 7,840 jersey images, namely the JerseyXIV is created, containing images of 14 categories of various football jersey types (Home and Alternate) belonging to 10 teams of 2015 Big 12 Conference football season. The quality of images varies in terms of pose, standoff distance, level of occlusion and illumination. Due to copyright restrictions on certain images, unaltered original images with appropriate credits can be provided upon request.;While various conventional and deep learning based classification approaches were empirically designed, optimized and tested, a solution that resulted in the highest accuracy in terms of classification was achieved by a train-time fused Convolutional Neural Network (CNN) architecture, namely CNN-F, with 92.61% accuracy. The final solution combines three different CNNs through score level average fusion achieving 96.90% test accuracy. To test these trained CNN models on a larger, application oriented scale, a video dataset is created, which may present an addition of higher rate of occlusion and elements of transmission noise. It consists of 14 videos, one for each class, totaling to 3,584 frames, with 2,188 frames containing the object of interest. With manual detection, the score level average fusion has achieved the highest classification accuracy of 81.31%.;In addition, three Image Quality Assessment techniques were tested to assess the drop in accuracy of the average-fusion method on the video dataset. The Natural Image Quality Evaluator (NIQE) index by Bovik et al. with a threshold of 0.40 on input images improved the test accuracy of the average fusion model on the video dataset to 86.36% by removing the low quality input images before it reaches the CNN.;The thesis concludes that the recommended solution for the classification is composed of data augmentation and fusion of networks, while for application of trained models on videos, an image quality metric would aid in performance increase with a trade-off in loss of input data
Pansharpening via Frequency-Aware Fusion Network with Explicit Similarity Constraints
The process of fusing a high spatial resolution (HR) panchromatic (PAN) image
and a low spatial resolution (LR) multispectral (MS) image to obtain an HRMS
image is known as pansharpening. With the development of convolutional neural
networks, the performance of pansharpening methods has been improved, however,
the blurry effects and the spectral distortion still exist in their fusion
results due to the insufficiency in details learning and the frequency mismatch
between MSand PAN. Therefore, the improvement of spatial details at the premise
of reducing spectral distortion is still a challenge. In this paper, we propose
a frequency-aware fusion network (FAFNet) together with a novel high-frequency
feature similarity loss to address above mentioned problems. FAFNet is mainly
composed of two kinds of blocks, where the frequency aware blocks aim to
extract features in the frequency domain with the help of discrete wavelet
transform (DWT) layers, and the frequency fusion blocks reconstruct and
transform the features from frequency domain to spatial domain with the
assistance of inverse DWT (IDWT) layers. Finally, the fusion results are
obtained through a convolutional block. In order to learn the correspondence,
we also propose a high-frequency feature similarity loss to constrain the HF
features derived from PAN and MS branches, so that HF features of PAN can
reasonably be used to supplement that of MS. Experimental results on three
datasets at both reduced- and full-resolution demonstrate the superiority of
the proposed method compared with several state-of-the-art pansharpening
models.Comment: 14 page
Blind Quality Assessment for in-the-Wild Images via Hierarchical Feature Fusion and Iterative Mixed Database Training
Image quality assessment (IQA) is very important for both end-users and
service-providers since a high-quality image can significantly improve the
user's quality of experience (QoE) and also benefit lots of computer vision
algorithms. Most existing blind image quality assessment (BIQA) models were
developed for synthetically distorted images, however, they perform poorly on
in-the-wild images, which are widely existed in various practical applications.
In this paper, we propose a novel BIQA model for in-the-wild images by
addressing two critical problems in this field: how to learn better
quality-aware feature representation, and how to solve the problem of
insufficient training samples in terms of their content and distortion
diversity. Considering that perceptual visual quality is affected by both
low-level visual features (e.g. distortions) and high-level semantic
information (e.g. content), we first propose a staircase structure to
hierarchically integrate the features from intermediate layers into the final
feature representation, which enables the model to make full use of visual
information from low-level to high-level. Then an iterative mixed database
training (IMDT) strategy is proposed to train the BIQA model on multiple
databases simultaneously, so the model can benefit from the increase in both
training samples and image content and distortion diversity and can learn a
more general feature representation. Experimental results show that the
proposed model outperforms other state-of-the-art BIQA models on six
in-the-wild IQA databases by a large margin. Moreover, the proposed model shows
an excellent performance in the cross-database evaluation experiments, which
further demonstrates that the learned feature representation is robust to
images with diverse distortions and content. The code will be released publicly
for reproducible research
- …