527 research outputs found
A domain adaptive deep learning solution for scanpath prediction of paintings
Cultural heritage understanding and preservation is an important issue for
society as it represents a fundamental aspect of its identity. Paintings
represent a significant part of cultural heritage, and are the subject of study
continuously. However, the way viewers perceive paintings is strictly related
to the so-called HVS (Human Vision System) behaviour. This paper focuses on the
eye-movement analysis of viewers during the visual experience of a certain
number of paintings. In further details, we introduce a new approach to
predicting human visual attention, which impacts several cognitive functions
for humans, including the fundamental understanding of a scene, and then extend
it to painting images. The proposed new architecture ingests images and returns
scanpaths, a sequence of points featuring a high likelihood of catching
viewers' attention. We use an FCNN (Fully Convolutional Neural Network), in
which we exploit a differentiable channel-wise selection and Soft-Argmax
modules. We also incorporate learnable Gaussian distributions onto the network
bottleneck to simulate visual attention process bias in natural scene images.
Furthermore, to reduce the effect of shifts between different domains (i.e.
natural images, painting), we urge the model to learn unsupervised general
features from other domains using a gradient reversal classifier. The results
obtained by our model outperform existing state-of-the-art ones in terms of
accuracy and efficiency.Comment: Accepted at CBMI2022 graz, austri
Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention
Stereoscopic image quality assessment (SIQA) plays a crucial role in
evaluating and improving the visual experience of 3D content. Existing
binocular properties and attention-based methods for SIQA have achieved
promising performance. However, these bottom-up approaches are inadequate in
exploiting the inherent characteristics of the human visual system (HVS). This
paper presents a novel network for SIQA via stereo attention, employing a
top-down perspective to guide the quality assessment process. Our proposed
method realizes the guidance from high-level binocular signals down to
low-level monocular signals, while the binocular and monocular information can
be calibrated progressively throughout the processing pipeline. We design a
generalized Stereo AttenTion (SAT) block to implement the top-down philosophy
in stereo perception. This block utilizes the fusion-generated attention map as
a high-level binocular modulator, influencing the representation of two
low-level monocular features. Additionally, we introduce an Energy Coefficient
(EC) to account for recent findings indicating that binocular responses in the
primate primary visual cortex are less than the sum of monocular responses. The
adaptive EC can tune the magnitude of binocular response flexibly, thus
enhancing the formation of robust binocular features within our framework. To
extract the most discriminative quality information from the summation and
subtraction of the two branches of monocular features, we utilize a
dual-pooling strategy that applies min-pooling and max-pooling operations to
the respective branches. Experimental results highlight the superiority of our
top-down method in simulating the property of visual perception and advancing
the state-of-the-art in the SIQA field. The code of this work is available at
https://github.com/Fanning-Zhang/SATNet.Comment: 13 pages, 4 figure
Quality Assessment of In-the-Wild Videos
Quality assessment of in-the-wild videos is a challenging problem because of
the absence of reference videos and shooting distortions. Knowledge of the
human visual system can help establish methods for objective quality assessment
of in-the-wild videos. In this work, we show two eminent effects of the human
visual system, namely, content-dependency and temporal-memory effects, could be
used for this purpose. We propose an objective no-reference video quality
assessment method by integrating both effects into a deep neural network. For
content-dependency, we extract features from a pre-trained image classification
neural network for its inherent content-aware property. For temporal-memory
effects, long-term dependencies, especially the temporal hysteresis, are
integrated into the network with a gated recurrent unit and a
subjectively-inspired temporal pooling layer. To validate the performance of
our method, experiments are conducted on three publicly available in-the-wild
video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm,
respectively. Experimental results demonstrate that our proposed method
outperforms five state-of-the-art methods by a large margin, specifically,
12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the
second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE,
respectively. Moreover, the ablation study verifies the crucial role of both
the content-aware features and the modeling of temporal-memory effects. The
PyTorch implementation of our method is released at
https://github.com/lidq92/VSFA.Comment: 9 pages, 7 figures, 4 tables. ACM Multimedia 2019 camera ready. ->
Update alignment formatting of Table
Aesthetic-Driven Image Enhancement by Adversarial Learning
We introduce EnhanceGAN, an adversarial learning based model that performs
automatic image enhancement. Traditional image enhancement frameworks typically
involve training models in a fully-supervised manner, which require expensive
annotations in the form of aligned image pairs. In contrast to these
approaches, our proposed EnhanceGAN only requires weak supervision (binary
labels on image aesthetic quality) and is able to learn enhancement operators
for the task of aesthetic-based image enhancement. In particular, we show the
effectiveness of a piecewise color enhancement module trained with weak
supervision, and extend the proposed EnhanceGAN framework to learning a deep
filtering-based aesthetic enhancer. The full differentiability of our image
enhancement operators enables the training of EnhanceGAN in an end-to-end
manner. We further demonstrate the capability of EnhanceGAN in learning
aesthetic-based image cropping without any groundtruth cropping pairs. Our
weakly-supervised EnhanceGAN reports competitive quantitative results on
aesthetic-based color enhancement as well as automatic image cropping, and a
user study confirms that our image enhancement results are on par with or even
preferred over professional enhancement
End-to-end Alternating Optimization for Real-World Blind Super Resolution
Blind Super-Resolution (SR) usually involves two sub-problems: 1) estimating
the degradation of the given low-resolution (LR) image; 2) super-resolving the
LR image to its high-resolution (HR) counterpart. Both problems are ill-posed
due to the information loss in the degrading process. Most previous methods try
to solve the two problems independently, but often fall into a dilemma: a good
super-resolved HR result requires an accurate degradation estimation, which
however, is difficult to be obtained without the help of original HR
information. To address this issue, instead of considering these two problems
independently, we adopt an alternating optimization algorithm, which can
estimate the degradation and restore the SR image in a single model.
Specifically, we design two convolutional neural modules, namely
\textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores the SR
image based on the estimated degradation, and \textit{Estimator} estimates the
degradation with the help of the restored SR image. We alternate these two
modules repeatedly and unfold this process to form an end-to-end trainable
network. In this way, both \textit{Restorer} and \textit{Estimator} could get
benefited from the intermediate results of each other, and make each
sub-problem easier. Moreover, \textit{Restorer} and \textit{Estimator} are
optimized in an end-to-end manner, thus they could get more tolerant of the
estimation deviations of each other and cooperate better to achieve more robust
and accurate final results. Extensive experiments on both synthetic datasets
and real-world images show that the proposed method can largely outperform
state-of-the-art methods and produce more visually favorable results. The codes
are rleased at \url{https://github.com/greatlog/RealDAN.git}.Comment: Extension of our previous NeurIPS paper. Accepted to IJC
- …