50,311 research outputs found
Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning
No-reference image quality assessment (NR-IQA) is a fundamental yet
challenging task in low-level computer vision community. The difficulty is
particularly pronounced for the limited information, for which the
corresponding reference for comparison is typically absent. Although various
feature extraction mechanisms have been leveraged from natural scene statistics
to deep neural networks in previous methods, the performance bottleneck still
exists. In this work, we propose a hallucination-guided quality regression
network to address the issue. We firstly generate a hallucinated reference
constrained on the distorted image, to compensate the absence of the true
reference. Then, we pair the information of hallucinated reference with the
distorted image, and forward them to the regressor to learn the perceptual
discrepancy with the guidance of an implicit ranking relationship within the
generator, and therefore produce the precise quality prediction. To demonstrate
the effectiveness of our approach, comprehensive experiments are evaluated on
four popular image quality assessment benchmarks. Our method significantly
outperforms all the previous state-of-the-art methods by large margins. The
code and model will be publicly available on the project page
https://kwanyeelin.github.io/projects/HIQA/HIQA.html.Comment: Accepted to CVPR201
Adaptive Quantile Sparse Image (AQuaSI) Prior for Inverse Imaging Problems
Inverse problems play a central role for many classical computer vision and
image processing tasks. Many inverse problems are ill-posed, and hence require
a prior to regularize the solution space. However, many of the existing priors,
like total variation, are based on ad-hoc assumptions that have difficulties to
represent the actual distribution of natural images. Thus, a key challenge in
research on image processing is to find better suited priors to represent
natural images.
In this work, we propose the Adaptive Quantile Sparse Image (AQuaSI) prior.
It is based on a quantile filter, can be used as a joint filter on guidance
data, and be readily plugged into a wide range of numerical optimization
algorithms. We demonstrate the efficacy of the proposed prior in joint
RGB/depth upsampling, on RGB/NIR image restoration, and in a comparison with
related regularization by denoising approaches
Towards Fine-grained Human Pose Transfer with Detail Replenishing Network
Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.Comment: IEEE TIP submissio
Learning to Calibrate Straight Lines for Fisheye Image Rectification
This paper presents a new deep-learning based method to simultaneously
calibrate the intrinsic parameters of fisheye lens and rectify the distorted
images. Assuming that the distorted lines generated by fisheye projection
should be straight after rectification, we propose a novel deep neural network
to impose explicit geometry constraints onto processes of the fisheye lens
calibration and the distorted image rectification. In addition, considering the
nonlinearity of distortion distribution in fisheye images, the proposed network
fully exploits multi-scale perception to equalize the rectification effects on
the whole image. To train and evaluate the proposed model, we also create a new
largescale dataset labeled with corresponding distortion parameters and
well-annotated distorted lines. Compared with the state-of-the-art methods, our
model achieves the best published rectification quality and the most accurate
estimation of distortion parameters on a large set of synthetic and real
fisheye images
Perception, Attention, and Resources: A Decision-Theoretic Approach to Graphics Rendering
We describe work to control graphics rendering under limited computational
resources by taking a decision-theoretic perspective on perceptual costs and
computational savings of approximations. The work extends earlier work on the
control of rendering by introducing methods and models for computing the
expected cost associated with degradations of scene components. The expected
cost is computed by considering the perceptual cost of degradations and a
probability distribution over the attentional focus of viewers. We review the
critical literature describing findings on visual search and attention, discuss
the implications of the findings, and introduce models of expected perceptual
cost. Finally, we discuss policies that harness information about the expected
cost of scene components.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in
Artificial Intelligence (UAI1997
Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos
Automatic saliency prediction in 360{\deg} videos is critical for viewpoint
guidance applications (e.g., Facebook 360 Guide). We propose a spatial-temporal
network which is (1) weakly-supervised trained and (2) tailor-made for
360{\deg} viewing sphere. Note that most existing methods are less scalable
since they rely on annotated saliency map for training. Most importantly, they
convert 360{\deg} sphere to 2D images (e.g., a single equirectangular image or
multiple separate Normal Field-of-View (NFoV) images) which introduces
distortion and image boundaries. In contrast, we propose a simple and effective
Cube Padding (CP) technique as follows. Firstly, we render the 360{\deg} view
on six faces of a cube using perspective projection. Thus, it introduces very
little distortion. Then, we concatenate all six faces while utilizing the
connectivity between faces on the cube for image padding (i.e., Cube Padding)
in convolution, pooling, convolutional LSTM layers. In this way, CP introduces
no image boundary while being applicable to almost all Convolutional Neural
Network (CNN) structures. To evaluate our method, we propose Wild-360, a new
360{\deg} video saliency dataset, containing challenging videos with saliency
heatmap annotations. In experiments, our method outperforms baseline methods in
both speed and quality.Comment: CVPR 201
Blind Predicting Similar Quality Map for Image Quality Assessment
A key problem in blind image quality assessment (BIQA) is how to effectively
model the properties of human visual system in a data-driven manner. In this
paper, we propose a simple and efficient BIQA model based on a novel framework
which consists of a fully convolutional neural network (FCNN) and a pooling
network to solve this problem. In principle, FCNN is capable of predicting a
pixel-by-pixel similar quality map only from a distorted image by using the
intermediate similarity maps derived from conventional full-reference image
quality assessment methods. The predicted pixel-by-pixel quality maps have good
consistency with the distortion correlations between the reference and
distorted images. Finally, a deep pooling network regresses the quality map
into a score. Experiments have demonstrated that our predictions outperform
many state-of-the-art BIQA methods
Cycle-IR: Deep Cyclic Image Retargeting
Supervised deep learning techniques have achieved great success in various
fields due to getting rid of the limitation of handcrafted representations.
However, most previous image retargeting algorithms still employ fixed design
principles such as using gradient map or handcrafted features to compute
saliency map, which inevitably restricts its generality. Deep learning
techniques may help to address this issue, but the challenging problem is that
we need to build a large-scale image retargeting dataset for the training of
deep retargeting models. However, building such a dataset requires huge human
efforts.
In this paper, we propose a novel deep cyclic image retargeting approach,
called Cycle-IR, to firstly implement image retargeting with a single deep
model, without relying on any explicit user annotations. Our idea is built on
the reverse mapping from the retargeted images to the given input images. If
the retargeted image has serious distortion or excessive loss of important
visual information, the reverse mapping is unlikely to restore the input image
well. We constrain this forward-reverse consistency by introducing a cyclic
perception coherence loss. In addition, we propose a simple yet effective image
retargeting network (IRNet) to implement the image retargeting process. Our
IRNet contains a spatial and channel attention layer, which is able to
discriminate visually important regions of input images effectively, especially
in cluttered images. Given arbitrary sizes of input images and desired aspect
ratios, our Cycle-IR can produce visually pleasing target images directly.
Extensive experiments on the standard RetargetMe dataset show the superiority
of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and
obtains the best result in the user study. Code is available at
https://github.com/mintanwei/Cycle-IR.Comment: 12 page
Attention-aware Multi-stroke Style Transfer
Neural style transfer has drawn considerable attention from both academic and
industrial field. Although visual effect and efficiency have been significantly
improved, existing methods are unable to coordinate spatial distribution of
visual attention between the content image and stylized image, or render
diverse level of detail via different brush strokes. In this paper, we tackle
these limitations by developing an attention-aware multi-stroke style transfer
model. We first propose to assemble self-attention mechanism into a
style-agnostic reconstruction autoencoder framework, from which the attention
map of a content image can be derived. By performing multi-scale style swap on
content features and style features, we produce multiple feature maps
reflecting different stroke patterns. A flexible fusion strategy is further
presented to incorporate the salient characteristics from the attention map,
which allows integrating multiple stroke patterns into different spatial
regions of the output image harmoniously. We demonstrate the effectiveness of
our method, as well as generate comparable stylized images with multiple stroke
patterns against the state-of-the-art methods
Face Image Reflection Removal
Face images captured through the glass are usually contaminated by
reflections. The non-transmitted reflections make the reflection removal more
challenging than for general scenes, because important facial features are
completely occluded. In this paper, we propose and solve the face image
reflection removal problem. We remove non-transmitted reflections by
incorporating inpainting ideas into a guided reflection removal framework and
recover facial features by considering various face-specific priors. We use a
newly collected face reflection image dataset to train our model and compare
with state-of-the-art methods. The proposed method shows advantages in
estimating reflection-free face images for improving face recognition
- …