2,989 research outputs found
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Learning Uncertain Convolutional Features for Accurate Saliency Detection
Deep convolutional neural networks (CNNs) have delivered superior performance
in many computer vision tasks. In this paper, we propose a novel deep fully
convolutional network model for accurate salient object detection. The key
contribution of this work is to learn deep uncertain convolutional features
(UCF), which encourage the robustness and accuracy of saliency detection. We
achieve this via introducing a reformulated dropout (R-dropout) after specific
convolutional layers to construct an uncertain ensemble of internal feature
units. In addition, we propose an effective hybrid upsampling method to reduce
the checkerboard artifacts of deconvolution operators in our decoder network.
The proposed methods can also be applied to other deep convolutional networks.
Compared with existing saliency detection methods, the proposed UCF model is
able to incorporate uncertainties for more accurate object boundary inference.
Extensive experiments demonstrate that our proposed saliency model performs
favorably against state-of-the-art approaches. The uncertain feature learning
mechanism as well as the upsampling method can significantly improve
performance on other pixel-wise vision tasks.Comment: Accepted as a poster in ICCV 2017,including 10 pages, 7 figures and 3
table
Deep Edge-Aware Saliency Detection
There has been profound progress in visual saliency thanks to the deep
learning architectures, however, there still exist three major challenges that
hinder the detection performance for scenes with complex compositions, multiple
salient objects, and salient objects of diverse scales. In particular, output
maps of the existing methods remain low in spatial resolution causing blurred
edges due to the stride and pooling operations, networks often neglect
descriptive statistical and handcrafted priors that have potential to
complement saliency detection results, and deep features at different layers
stay mainly desolate waiting to be effectively fused to handle multi-scale
salient objects. In this paper, we tackle these issues by a new fully
convolutional neural network that jointly learns salient edges and saliency
labels in an end-to-end fashion. Our framework first employs convolutional
layers that reformulate the detection task as a dense labeling problem, then
integrates handcrafted saliency features in a hierarchical manner into lower
and higher levels of the deep network to leverage available information for
multi-scale response, and finally refines the saliency map through dilated
convolutions by imposing context. In this way, the salient edge priors are
efficiently incorporated and the output resolution is significantly improved
while keeping the memory requirements low, leading to cleaner and sharper
object boundaries. Extensive experimental analyses on ten benchmarks
demonstrate that our framework achieves consistently superior performance and
attains robustness for complex scenes in comparison to the very recent
state-of-the-art approaches.Comment: 13 pages, 11 figure
SCOPS: Self-Supervised Co-Part Segmentation
Parts provide a good intermediate representation of objects that is robust
with respect to the camera, pose and appearance variations. Existing works on
part segmentation is dominated by supervised approaches that rely on large
amounts of manual annotations and can not generalize to unseen object
categories. We propose a self-supervised deep learning approach for part
segmentation, where we devise several loss functions that aids in predicting
part segments that are geometrically concentrated, robust to object variations
and are also semantically consistent across different object instances.
Extensive experiments on different types of image collections demonstrate that
our approach can produce part segments that adhere to object boundaries and
also more semantically consistent across object instances compared to existing
self-supervised techniques.Comment: Accepted in CVPR 2019. Project page:
http://varunjampani.github.io/scop
Segmentation of Skin Lesions and their Attributes Using Multi-Scale Convolutional Neural Networks and Domain Specific Augmentations
Computer-aided diagnosis systems for classification of different type of skin
lesions have been an active field of research in recent decades. It has been
shown that introducing lesions and their attributes masks into lesion
classification pipeline can greatly improve the performance. In this paper, we
propose a framework by incorporating transfer learning for segmenting lesions
and their attributes based on the convolutional neural networks. The proposed
framework is based on the encoder-decoder architecture which utilizes a variety
of pre-trained networks in the encoding path and generates the prediction map
by combining multi-scale information in decoding path using a pyramid pooling
manner. To address the lack of training data and increase the proposed model
generalization, an extensive set of novel domain-specific augmentation routines
have been applied to simulate the real variations in dermoscopy images.
Finally, by performing broad experiments on three different data sets obtained
from International Skin Imaging Collaboration archive (ISIC2016, ISIC2017, and
ISIC2018 challenges data sets), we show that the proposed method outperforms
other state-of-the-art approaches for ISIC2016 and ISIC2017 segmentation task
and achieved the first rank on the leader-board of ISIC2018 attribute detection
task.Comment: 18 page
Computational Parquetry: Fabricated Style Transfer with Wood Pixels
Parquetry is the art and craft of decorating a surface with a pattern of
differently colored veneers of wood, stone or other materials. Traditionally,
the process of designing and making parquetry has been driven by color, using
the texture found in real wood only for stylization or as a decorative effect.
Here, we introduce a computational pipeline that draws from the rich natural
structure of strongly textured real-world veneers as a source of detail in
order to approximate a target image as faithfully as possible using a
manageable number of parts. This challenge is closely related to the
established problems of patch-based image synthesis and stylization in some
ways, but fundamentally different in others. Most importantly, the limited
availability of resources (any piece of wood can only be used once) turns the
relatively simple problem of finding the right piece for the target location
into the combinatorial problem of finding optimal parts while avoiding resource
collisions. We introduce an algorithm that allows to efficiently solve an
approximation to the problem. It further addresses challenges like gamut
mapping, feature characterization and the search for fabricable cuts. We
demonstrate the effectiveness of the system by fabricating a selection of
"photo-realistic" pieces of parquetry from different kinds of unstained wood
veneer
Contrast-weighted Dictionary Learning Based Saliency Detection for Remote Sensing Images
Object detection is an important task in remote sensing image analysis. To
reduce the computational complexity of redundant information and improve the
efficiency of image processing, visual saliency models have been widely applied
in this field. In this paper, a novel saliency detection model based on
Contrast-weighted Dictionary Learning (CDL) is proposed for remote sensing
images. Specifically, the proposed CDL learns salient and non-salient atoms
from positive and negative samples to construct a discriminant dictionary, in
which a contrast-weighted term is proposed to encourage the contrast-weighted
patterns to be present in the learned salient dictionary while discouraging
them from being present in the non-salient dictionary. Then, we measure the
saliency by combining the coefficients of the sparse representation (SR) and
reconstruction errors. Furthermore, by using the proposed joint saliency
measure, a variety of saliency maps are generated based on the discriminant
dictionary. Finally, a fusion method based on global gradient optimization is
proposed to integrate multiple saliency maps. Experimental results on four
datasets demonstrate that the proposed model outperforms other state-of-the-art
methods
Query-Aware Sparse Coding for Multi-Video Summarization
Given the explosive growth of online videos, it is becoming increasingly
important to relieve the tedious work of browsing and managing the video
content of interest. Video summarization aims at providing such a technique by
transforming one or multiple videos into a compact one. However, conventional
multi-video summarization methods often fail to produce satisfying results as
they ignore the user's search intent. To this end, this paper proposes a novel
query-aware approach by formulating the multi-video summarization in a sparse
coding framework, where the web images searched by the query are taken as the
important preference information to reveal the query intent. To provide a
user-friendly summarization, this paper also develops an event-keyframe
presentation structure to present keyframes in groups of specific events
related to the query by using an unsupervised multi-graph fusion method. We
release a new public dataset named MVS1K, which contains about 1, 000 videos
from 10 queries and their video tags, manual annotations, and associated web
images. Extensive experiments on MVS1K dataset validate our approaches produce
superior objective and subjective results against several recently proposed
approaches.Comment: 10 pages, 8 figure
Efficient and Interpretable Infrared and Visible Image Fusion Via Algorithm Unrolling
Infrared and visible image fusion expects to obtain images that highlight
thermal radiation information from infrared images and texture details from
visible images. In this paper, an interpretable deep network fusion model is
proposed. Initially, two optimization models are established to accomplish
two-scale decomposition, separating low-frequency base information and
high-frequency detail information from source images. The algorithm unrolling
that each iteration process is mapped to a convolutional neural network layer
to transfer the optimization steps into the trainable neural networks, is
implemented to solve the optimization models. In the test phase, the two
decomposition feature maps of base and detail are merged respectively by the
fusion layer, and then the decoder outputs the fusion image. Qualitative and
quantitative comparisons demonstrate the superiority of our model, which is
interpretable and can robustly generate fusion images containing highlight
targets and legible details, exceeding the state-of-the-art methods
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
This report summarizes IROS 2019-Lifelong Robotic Vision Competition
(Lifelong Object Recognition Challenge) with methods and results from the top
finalists (out of over~ teams). The competition dataset (L)ifel(O)ng
(R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is
designed for driving lifelong/continual learning research and application in
robotic vision domain, with everyday objects in home, office, campus, and mall
scenarios. The dataset explicitly quantifies the variants of illumination,
object occlusion, object size, camera-object distance/angles, and clutter
information. Rules are designed to quantify the learning capability of the
robotic vision system when faced with the objects appearing in the dynamic
environments in the contest. Individual reports, dataset information, rules,
and released source code can be found at the project homepage:
"https://lifelong-robotic-vision.github.io/competition/".Comment: 9 pages, 11 figures, 3 tables, accepted into IEEE Robotics and
Automation Magazine. arXiv admin note: text overlap with arXiv:1911.0648
- …