1,462 research outputs found
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
A dense subgraph based algorithm for compact salient image region detection
We present an algorithm for graph based saliency computation that utilizes
the underlying dense subgraphs in finding visually salient regions in an image.
To compute the salient regions, the model first obtains a saliency map using
random walks on a Markov chain. Next, k-dense subgraphs are detected to further
enhance the salient regions in the image. Dense subgraphs convey more
information about local graph structure than simple centrality measures. To
generate the Markov chain, intensity and color features of an image in addition
to region compactness is used. For evaluating the proposed model, we do
extensive experiments on benchmark image data sets. The proposed method
performs comparable to well-known algorithms in salient region detection.Comment: 33 pages, 18 figures, Single column manuscript pre-print, Accepted at
Computer Vision and Image Understanding, Elsevie
Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM
Over the past few years, deep neural networks (DNNs) have exhibited great
success in predicting the saliency of images. However, there are few works that
apply DNNs to predict the saliency of generic videos. In this paper, we propose
a novel DNN-based video saliency prediction method. Specifically, we establish
a large-scale eye-tracking database of videos (LEDOV), which provides
sufficient data to train the DNN models for predicting video saliency. Through
the statistical analysis of our LEDOV database, we find that human attention is
normally attracted by objects, particularly moving objects or the moving parts
of objects. Accordingly, we propose an object-to-motion convolutional neural
network (OM-CNN) to learn spatio-temporal features for predicting the
intra-frame saliency via exploring the information of both objectness and
object motion. We further find from our database that there exists a temporal
correlation of human attention with a smooth saliency transition across video
frames. Therefore, we develop a two-layer convolutional long short-term memory
(2C-LSTM) network in our DNN-based method, using the extracted features of
OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can
be generated, which consider the transition of attention across video frames.
Finally, the experimental results show that our method advances the
state-of-the-art in video saliency prediction.Comment: Jiang, Lai and Xu, Mai and Liu, Tie and Qiao, Minglang and Wang,
Zulin; DeepVS: A Deep Learning Based Video Saliency Prediction Approach;The
European Conference on Computer Vision (ECCV); September 201
Hierarchical Cellular Automata for Visual Saliency
Saliency detection, finding the most important parts of an image, has become
increasingly popular in computer vision. In this paper, we introduce
Hierarchical Cellular Automata (HCA) -- a temporally evolving model to
intelligently detect salient objects. HCA consists of two main components:
Single-layer Cellular Automata (SCA) and Cuboid Cellular Automata (CCA). As an
unsupervised propagation mechanism, Single-layer Cellular Automata can exploit
the intrinsic relevance of similar regions through interactions with neighbors.
Low-level image features as well as high-level semantic information extracted
from deep neural networks are incorporated into the SCA to measure the
correlation between different image patches. With these hierarchical deep
features, an impact factor matrix and a coherence matrix are constructed to
balance the influences on each cell's next state. The saliency values of all
cells are iteratively updated according to a well-defined update rule.
Furthermore, we propose CCA to integrate multiple saliency maps generated by
SCA at different scales in a Bayesian framework. Therefore, single-layer
propagation and multi-layer integration are jointly modeled in our unified HCA.
Surprisingly, we find that the SCA can improve all existing methods that we
applied it to, resulting in a similar precision level regardless of the
original results. The CCA can act as an efficient pixel-wise aggregation
algorithm that can integrate state-of-the-art methods, resulting in even better
results. Extensive experiments on four challenging datasets demonstrate that
the proposed algorithm outperforms state-of-the-art conventional methods and is
competitive with deep learning based approaches
Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception
To solve its task, a robot needs to have the ability to interpret its
perceptions. In vision, this interpretation is particularly difficult and
relies on the understanding of the structure of the scene, at least to the
extent of its task and sensorimotor abilities. A robot with the ability to
build and adapt this interpretation process according to its own tasks and
capabilities would push away the limits of what robots can achieve in a non
controlled environment. A solution is to provide the robot with processes to
build such representations that are not specific to an environment or a
situation. A lot of works focus on objects segmentation, recognition and
manipulation. Defining an object solely on the basis of its visual appearance
is challenging given the wide range of possible objects and environments.
Therefore, current works make simplifying assumptions about the structure of a
scene. Such assumptions reduce the adaptivity of the object extraction process
to the environments in which the assumption holds. To limit such assumptions,
we introduce an exploration method aimed at identifying moveable elements in a
scene without considering the concept of object. By using the interactive
perception framework, we aim at bootstrapping the acquisition process of a
representation of the environment with a minimum of context specific
assumptions. The robotic system builds a perceptual map called relevance map
which indicates the moveable parts of the current scene. A classifier is
trained online to predict the category of each region (moveable or
non-moveable). It is also used to select a region with which to interact, with
the goal of minimizing the uncertainty of the classification. A specific
classifier is introduced to fit these needs: the collaborative mixture models
classifier. The method is tested on a set of scenarios of increasing
complexity, using both simulations and a PR2 robot.Comment: 21 pages, 21 figure
Saliency Detection combining Multi-layer Integration algorithm with background prior and energy function
In this paper, we propose an improved mechanism for saliency detection.
Firstly,based on a neoteric background prior selecting four corners of an image
as background,we use color and spatial contrast with each superpixel to obtain
a salinecy map(CBP). Inspired by reverse-measurement methods to improve the
accuracy of measurement in Engineering,we employ the Objectness labels as
foreground prior based on part of information of CBP to construct a
map(OFP).Further,an original energy function is applied to optimize both of
them respectively and a single-layer saliency map(SLP)is formed by merging the
above twos.Finally,to deal with the scale problem,we obtain our multi-layer
map(MLP) by presenting an integration algorithm to take advantage of multiple
saliency maps. Quantitative and qualitative experiments on three datasets
demonstrate that our method performs favorably against the state-of-the-art
algorithm.Comment: 25 pages, 8 figures. arXiv admin note: text overlap with
arXiv:1505.07192 by other author
Computational models: Bottom-up and top-down aspects
Computational models of visual attention have become popular over the past
decade, we believe primarily for two reasons: First, models make testable
predictions that can be explored by experimentalists as well as theoreticians,
second, models have practical and technological applications of interest to the
applied science and engineering communities. In this chapter, we take a
critical look at recent attention modeling efforts. We focus on {\em
computational models of attention} as defined by Tsotsos \& Rothenstein
\shortcite{Tsotsos_Rothenstein11}: Models which can process any visual stimulus
(typically, an image or video clip), which can possibly also be given some task
definition, and which make predictions that can be compared to human or animal
behavioral or physiological responses elicited by the same stimulus and task.
Thus, we here place less emphasis on abstract models, phenomenological models,
purely data-driven fitting or extrapolation models, or models specifically
designed for a single task or for a restricted class of stimuli. For
theoretical models, we refer the reader to a number of previous reviews that
address attention theories and models more generally
\cite{Itti_Koch01nrn,Paletta_etal05,Frintrop_etal10,Rothenstein_Tsotsos08,Gottlieb_Balan10,Toet11,Borji_Itti12pami}
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Crowded Scene Analysis: A Survey
Automated scene analysis has been a topic of great interest in computer
vision and cognitive science. Recently, with the growth of crowd phenomena in
the real world, crowded scene analysis has attracted much attention. However,
the visual occlusions and ambiguities in crowded scenes, as well as the complex
behaviors and scene semantics, make the analysis a challenging task. In the
past few years, an increasing number of works on crowded scene analysis have
been reported, covering different aspects including crowd motion pattern
learning, crowd behavior and activity analysis, and anomaly detection in
crowds. This paper surveys the state-of-the-art techniques on this topic. We
first provide the background knowledge and the available features related to
crowded scenes. Then, existing models, popular algorithms, evaluation
protocols, as well as system performance are provided corresponding to
different aspects of crowded scene analysis. We also outline the available
datasets for performance evaluation. Finally, some research problems and
promising future directions are presented with discussions.Comment: 20 pages in IEEE Transactions on Circuits and Systems for Video
Technology, 201
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
- …