2,889 research outputs found
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
HSCS: Hierarchical Sparsity Based Co-saliency Detection for RGBD Images
Co-saliency detection aims to discover common and salient objects in an image
group containing more than two relevant images. Moreover, depth information has
been demonstrated to be effective for many computer vision tasks. In this
paper, we propose a novel co-saliency detection method for RGBD images based on
hierarchical sparsity reconstruction and energy function refinement. With the
assistance of the intra saliency map, the inter-image correspondence is
formulated as a hierarchical sparsity reconstruction framework. The global
sparsity reconstruction model with a ranking scheme focuses on capturing the
global characteristics among the whole image group through a common foreground
dictionary. The pairwise sparsity reconstruction model aims to explore the
corresponding relationship between pairwise images through a set of pairwise
dictionaries. In order to improve the intra-image smoothness and inter-image
consistency, an energy function refinement model is proposed, which includes
the unary data term, spatial smooth term, and holistic consistency term.
Experiments on two RGBD co-saliency detection benchmarks demonstrate that the
proposed method outperforms the state-of-the-art algorithms both qualitatively
and quantitatively.Comment: 11 pages, 5 figures, Accepted by IEEE Transactions on Multimedia,
https://rmcong.github.io
Video Smoke Detection Based on Deep Saliency Network
Video smoke detection is a promising fire detection method especially in open
or large spaces and outdoor environments. Traditional video smoke detection
methods usually consist of candidate region extraction and classification, but
lack powerful characterization for smoke. In this paper, we propose a novel
video smoke detection method based on deep saliency network. Visual saliency
detection aims to highlight the most important object regions in an image. The
pixel-level and object-level salient convolutional neural networks are combined
to extract the informative smoke saliency map. An end-to-end framework for
salient smoke detection and existence prediction of smoke is proposed for
application in video smoke detection. The deep feature map is combined with the
saliency map to predict the existence of smoke in an image. Initial and
augmented dataset are built to measure the performance of frameworks with
different design strategies. Qualitative and quantitative analysis at
frame-level and pixel-level demonstrate the excellent performance of the
ultimate framework.Comment: 21 pages, 12 figure
A dense subgraph based algorithm for compact salient image region detection
We present an algorithm for graph based saliency computation that utilizes
the underlying dense subgraphs in finding visually salient regions in an image.
To compute the salient regions, the model first obtains a saliency map using
random walks on a Markov chain. Next, k-dense subgraphs are detected to further
enhance the salient regions in the image. Dense subgraphs convey more
information about local graph structure than simple centrality measures. To
generate the Markov chain, intensity and color features of an image in addition
to region compactness is used. For evaluating the proposed model, we do
extensive experiments on benchmark image data sets. The proposed method
performs comparable to well-known algorithms in salient region detection.Comment: 33 pages, 18 figures, Single column manuscript pre-print, Accepted at
Computer Vision and Image Understanding, Elsevie
Saliency guided deep network for weakly-supervised image segmentation
Weakly-supervised image segmentation is an important task in computer vision.
A key problem is how to obtain high quality objects location from image-level
category. Classification activation mapping is a common method which can be
used to generate high-precise object location cues. However these location cues
are generally very sparse and small such that they can not provide effective
information for image segmentation. In this paper, we propose a saliency guided
image segmentation network to resolve this problem. We employ a self-attention
saliency method to generate subtle saliency maps, and render the location cues
grow as seeds by seeded region growing method to expand pixel-level labels
extent. In the process of seeds growing, we use the saliency values to weight
the similarity between pixels to control the growing. Therefore saliency
information could help generate discriminative object regions, and the effects
of wrong salient pixels can be suppressed efficiently. Experimental results on
a common segmentation dataset PASCAL VOC2012 demonstrate the effectiveness of
our method
Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception
To solve its task, a robot needs to have the ability to interpret its
perceptions. In vision, this interpretation is particularly difficult and
relies on the understanding of the structure of the scene, at least to the
extent of its task and sensorimotor abilities. A robot with the ability to
build and adapt this interpretation process according to its own tasks and
capabilities would push away the limits of what robots can achieve in a non
controlled environment. A solution is to provide the robot with processes to
build such representations that are not specific to an environment or a
situation. A lot of works focus on objects segmentation, recognition and
manipulation. Defining an object solely on the basis of its visual appearance
is challenging given the wide range of possible objects and environments.
Therefore, current works make simplifying assumptions about the structure of a
scene. Such assumptions reduce the adaptivity of the object extraction process
to the environments in which the assumption holds. To limit such assumptions,
we introduce an exploration method aimed at identifying moveable elements in a
scene without considering the concept of object. By using the interactive
perception framework, we aim at bootstrapping the acquisition process of a
representation of the environment with a minimum of context specific
assumptions. The robotic system builds a perceptual map called relevance map
which indicates the moveable parts of the current scene. A classifier is
trained online to predict the category of each region (moveable or
non-moveable). It is also used to select a region with which to interact, with
the goal of minimizing the uncertainty of the classification. A specific
classifier is introduced to fit these needs: the collaborative mixture models
classifier. The method is tested on a set of scenarios of increasing
complexity, using both simulations and a PR2 robot.Comment: 21 pages, 21 figure
NeRD: a Neural Response Divergence Approach to Visual Salience Detection
In this paper, a novel approach to visual salience detection via Neural
Response Divergence (NeRD) is proposed, where synaptic portions of deep neural
networks, previously trained for complex object recognition, are leveraged to
compute low level cues that can be used to compute image region
distinctiveness. Based on this concept , an efficient visual salience detection
framework is proposed using deep convolutional StochasticNets. Experimental
results using CSSD and MSRA10k natural image datasets show that the proposed
NeRD approach can achieve improved performance when compared to
state-of-the-art image saliency approaches, while the attaining low
computational complexity necessary for near-real-time computer vision
applications.Comment: 5 page
PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Edge-Preserving Coherence
Driven by recent vision and graphics applications such as image segmentation
and object recognition, computing pixel-accurate saliency values to uniformly
highlight foreground objects becomes increasingly important. In this paper, we
propose a unified framework called PISA, which stands for Pixelwise Image
Saliency Aggregating various bottom-up cues and priors. It generates spatially
coherent yet detail-preserving, pixel-accurate and fine-grained saliency, and
overcomes the limitations of previous methods which use homogeneous
superpixel-based and color only treatment. PISA aggregates multiple saliency
cues in a global context such as complementary color and structure contrast
measures with their spatial priors in the image domain. The saliency confidence
is further jointly modeled with a neighborhood consistence constraint into an
energy minimization formulation, in which each pixel will be evaluated with
multiple hypothetical saliency levels. Instead of using global discrete
optimization methods, we employ the cost-volume filtering technique to solve
our formulation, assigning the saliency levels smoothly while preserving the
edge-aware structure details. In addition, a faster version of PISA is
developed using a gradient-driven image sub-sampling strategy to greatly
improve the runtime efficiency while keeping comparable detection accuracy.
Extensive experiments on a number of public datasets suggest that PISA
convincingly outperforms other state-of-the-art approaches. In addition, with
this work we also create a new dataset containing commodity images for
evaluating saliency detection. The dataset and source code of PISA can be
downloaded at http://vision.sysu.edu.cn/project/PISA/Comment: 14 pages, 14 figures, 1 table, to appear in IEEE Transactions on
Image Processin
Coarse-to-Fine Salient Object Detection with Low-Rank Matrix Recovery
Low-Rank Matrix Recovery (LRMR) has recently been applied to saliency
detection by decomposing image features into a low-rank component associated
with background and a sparse component associated with visual salient regions.
Despite its great potential, existing LRMR-based saliency detection methods
seldom consider the inter-relationship among elements within these two
components, thus are prone to generating scattered or incomplete saliency maps.
In this paper, we introduce a novel and efficient LRMR-based saliency detection
model under a coarse-to-fine framework to circumvent this limitation. First, we
roughly measure the saliency of image regions with a baseline LRMR model that
integrates a -norm sparsity constraint and a Laplacian regularization
smooth term. Given samples from the coarse saliency map, we then learn a
projection that maps image features to refined saliency values, to
significantly sharpen the object boundaries and to preserve the object
entirety. We evaluate our framework against existing LRMR-based methods on
three benchmark datasets. Experimental results validate the superiority of our
method as well as the effectiveness of our suggested coarse-to-fine framework,
especially for images containing multiple objects.Comment: Manuscript accepted by Neurocomputing, matlab code is available from
https://github.com/qizhust/HLRSalienc
Image Co-segmentation via Multi-scale Local Shape Transfer
Image co-segmentation is a challenging task in computer vision that aims to
segment all pixels of the objects from a predefined semantic category. In
real-world cases, however, common foreground objects often vary greatly in
appearance, making their global shapes highly inconsistent across images and
difficult to be segmented. To address this problem, this paper proposes a novel
co-segmentation approach that transfers patch-level local object shapes which
appear more consistent across different images. In our framework, a multi-scale
patch neighbourhood system is first generated using proposal flow on arbitrary
image-pair, which is further refined by Locally Linear Embedding. Based on the
patch relationships, we propose an efficient algorithm to jointly segment the
objects in each image while transferring their local shapes across different
images. Extensive experiments demonstrate that the proposed method can robustly
and effectively segment common objects from an image set. On iCoseg, MSRC and
Coseg-Rep dataset, the proposed approach performs comparable or better than the
state-of-thearts, while on a more challenging benchmark Fashionista dataset,
our method achieves significant improvements.Comment: An extention of our previous stud
- …