1,455 research outputs found
Pixel-Level Self-Paced Learning for Super-Resolution
Recently, lots of deep networks are proposed to improve the quality of
predicted super-resolution (SR) images, due to its widespread use in several
image-based fields. However, with these networks being constructed deeper and
deeper, they also cost much longer time for training, which may guide the
learners to local optimization. To tackle this problem, this paper designs a
training strategy named Pixel-level Self-Paced Learning (PSPL) to accelerate
the convergence velocity of SISR models. PSPL imitating self-paced learning
gives each pixel in the predicted SR image and its corresponding pixel in
ground truth an attention weight, to guide the model to a better region in
parameter space. Extensive experiments proved that PSPL could speed up the
training of SISR models, and prompt several existing models to obtain new
better results. Furthermore, the source code is available at
https://github.com/Elin24/PSPL.Comment: 5 pages, 5 figures. Accepted by ICASSP 2020, Source code:
https://github.com/Elin24/PSP
Adaptive Semantic Segmentation with a Strategic Curriculum of Proxy Labels
Training deep networks for semantic segmentation requires annotation of large
amounts of data, which can be time-consuming and expensive. Unfortunately,
these trained networks still generalize poorly when tested in domains not
consistent with the training data. In this paper, we show that by carefully
presenting a mixture of labeled source domain and proxy-labeled target domain
data to a network, we can achieve state-of-the-art unsupervised domain
adaptation results. With our design, the network progressively learns features
specific to the target domain using annotation from only the source domain. We
generate proxy labels for the target domain using the network's own
predictions. Our architecture then allows selective mining of easy samples from
this set of proxy labels, and hard samples from the annotated source domain. We
conduct a series of experiments with the GTA5, Cityscapes and BDD100k datasets
on synthetic-to-real domain adaptation and geographic domain adaptation,
showing the advantages of our method over baselines and existing approaches
Bi-Skip: A Motion Deblurring Network Using Self-paced Learning
A fast and effective motion deblurring method has great application values in
real life. This work presents an innovative approach in which a self-paced
learning is combined with GAN to deblur image. First, We explain that a proper
generator can be used as deep priors and point out that the solution for
pixel-based loss is not same with the one for perception-based loss. By using
these ideas as starting points, a Bi-Skip network is proposed to improve the
generating ability and a bi-level loss is adopted to solve the problem that
common conditions are non-identical. Second, considering that the complex
motion blur will perturb the network in the training process, a self-paced
mechanism is adopted to enhance the robustness of the network. Through
extensive evaluations on both qualitative and quantitative criteria, it is
demonstrated that our approach has a competitive advantage over
state-of-the-art methods
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!
Autoencoders are unsupervised deep learning models used for learning
representations. In literature, autoencoders have shown to perform well on a
variety of tasks spread across multiple domains, thereby establishing
widespread applicability. Typically, an autoencoder is trained to generate a
model that minimizes the reconstruction error between the input and the
reconstructed output, computed in terms of the Euclidean distance. While this
can be useful for applications related to unsupervised reconstruction, it may
not be optimal for classification. In this paper, we propose a novel Supervised
COSMOS Autoencoder which utilizes a multi-objective loss function to learn
representations that simultaneously encode the (i) "similarity" between the
input and reconstructed vectors in terms of their direction, (ii)
"distribution" of pixel values of the reconstruction with respect to the input
sample, while also incorporating (iii) "discriminability" in the feature
learning pipeline. The proposed autoencoder model incorporates a Cosine
similarity and Mahalanobis distance based loss function, along with supervision
via Mutual Information based loss. Detailed analysis of each component of the
proposed model motivates its applicability for feature learning in different
classification tasks. The efficacy of Supervised COSMOS autoencoder is
demonstrated via extensive experimental evaluations on different image
datasets. The proposed model outperforms existing algorithms on MNIST,
CIFAR-10, and SVHN databases. It also yields state-of-the-art results on
CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face
recognition, respectively
Deep Likelihood Network for Image Restoration with Multiple Degradation Levels
Convolutional neural networks have been proven effective in a variety of
image restoration tasks. Most state-of-the-art solutions, however, are trained
using images with a single particular degradation level, and their performance
deteriorates drastically when applied to other degradation settings. In this
paper, we propose deep likelihood network (DL-Net), aiming at generalizing
off-the-shelf image restoration networks to succeed over a spectrum of
degradation levels. We slightly modify an off-the-shelf network by appending a
simple recursive module, which is derived from a fidelity term, for
disentangling the computation for multiple degradation levels. Extensive
experimental results on image inpainting, interpolation, and super-resolution
show the effectiveness of our DL-Net.Comment: Accepted by IEEE Transactions on Image Processing; 13 pages, 6
figure
Soft Proposal Networks for Weakly Supervised Object Localization
Weakly supervised object localization remains challenging, where only image
labels instead of bounding boxes are available during training. Object proposal
is an effective component in localization, but often computationally expensive
and incapable of joint optimization with some of the remaining modules. In this
paper, to the best of our knowledge, we for the first time integrate weakly
supervised object proposal into convolutional neural networks (CNNs) in an
end-to-end learning manner. We design a network component, Soft Proposal (SP),
to be plugged into any standard convolutional architecture to introduce the
nearly cost-free object proposal, orders of magnitude faster than
state-of-the-art methods. In the SP-augmented CNNs, referred to as Soft
Proposal Networks (SPNs), iteratively evolved object proposals are generated
based on the deep feature maps then projected back, and further jointly
optimized with network parameters, with image-level supervision only. Through
the unified learning process, SPNs learn better object-centric filters,
discover more discriminative visual evidence, and suppress background
interference, significantly boosting both weakly supervised object localization
and classification performance. We report the best results on popular
benchmarks, including PASCAL VOC, MS COCO, and ImageNet.Comment: ICCV 201
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Salient Object Detection in the Deep Learning Era: An In-Depth Survey
As an essential problem in computer vision, salient object detection (SOD)
has attracted an increasing amount of research attention over the years. Recent
advances in SOD are predominantly led by deep learning-based solutions (named
deep SOD). To enable in-depth understanding of deep SOD, in this paper, we
provide a comprehensive survey covering various aspects, ranging from algorithm
taxonomy to unsolved issues. In particular, we first review deep SOD algorithms
from different perspectives, including network architecture, level of
supervision, learning paradigm, and object-/instance-level detection. Following
that, we summarize and analyze existing SOD datasets and evaluation metrics.
Then, we benchmark a large group of representative SOD models, and provide
detailed analyses of the comparison results. Moreover, we study the performance
of SOD algorithms under different attribute settings, which has not been
thoroughly explored previously, by constructing a novel SOD dataset with rich
attribute annotations covering various salient object types, challenging
factors, and scene categories. We further analyze, for the first time in the
field, the robustness of SOD models to random input perturbations and
adversarial attacks. We also look into the generalization and difficulty of
existing SOD datasets. Finally, we discuss several open issues of SOD and
outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our
constructed dataset with annotations, and codes for evaluation are publicly
available at \url{https://github.com/wenguanwang/SODsurvey
- …