6,439 research outputs found
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
Salient Object Detection in Video using Deep Non-Local Neural Networks
Detection of salient objects in image and video is of great importance in
many computer vision applications. In spite of the fact that the state of the
art in saliency detection for still images has been changed substantially over
the last few years, there have been few improvements in video saliency
detection. This paper investigates the use of recently introduced non-local
neural networks in video salient object detection. Non-local neural networks
are applied to capture global dependencies and hence determine the salient
objects. The effect of non-local operations is studied separately on static and
dynamic saliency detection in order to exploit both appearance and motion
features. A novel deep non-local neural network architecture is introduced for
video salient object detection and tested on two well-known datasets DAVIS and
FBMS. The experimental results show that the proposed algorithm outperforms
state-of-the-art video saliency detection methods.Comment: Submitted to Journal of Visual Communication and Image Representatio
Region-Based Multiscale Spatiotemporal Saliency for Video
Detecting salient objects from a video requires exploiting both spatial and
temporal knowledge included in the video. We propose a novel region-based
multiscale spatiotemporal saliency detection method for videos, where static
features and dynamic features computed from the low and middle levels are
combined together. Our method utilizes such combined features spatially over
each frame and, at the same time, temporally across frames using consistency
between consecutive frames. Saliency cues in our method are analyzed through a
multiscale segmentation model, and fused across scale levels, yielding to
exploring regions efficiently. An adaptive temporal window using motion
information is also developed to combine saliency values of consecutive frames
in order to keep temporal consistency across frames. Performance evaluation on
several popular benchmark datasets validates that our method outperforms
existing state-of-the-arts
Unsupervised Video Analysis Based on a Spatiotemporal Saliency Detector
Visual saliency, which predicts regions in the field of view that draw the
most visual attention, has attracted a lot of interest from researchers. It has
already been used in several vision tasks, e.g., image classification, object
detection, foreground segmentation. Recently, the spectrum analysis based
visual saliency approach has attracted a lot of interest due to its simplicity
and good performance, where the phase information of the image is used to
construct the saliency map. In this paper, we propose a new approach for
detecting spatiotemporal visual saliency based on the phase spectrum of the
videos, which is easy to implement and computationally efficient. With the
proposed algorithm, we also study how the spatiotemporal saliency can be used
in two important vision task, abnormality detection and spatiotemporal interest
point detection. The proposed algorithm is evaluated on several commonly used
datasets with comparison to the state-of-art methods from the literature. The
experiments demonstrate the effectiveness of the proposed approach to
spatiotemporal visual saliency detection and its application to the above
vision tasksComment: 21 page
Video Salient Object Detection Using Spatiotemporal Deep Features
This paper presents a method for detecting salient objects in videos where
temporal information in addition to spatial information is fully taken into
account. Following recent reports on the advantage of deep features over
conventional hand-crafted features, we propose a new set of SpatioTemporal Deep
(STD) features that utilize local and global contexts over frames. We also
propose new SpatioTemporal Conditional Random Field (STCRF) to compute saliency
from STD features. STCRF is our extension of CRF to the temporal domain and
describes the relationships among neighboring regions both in a frame and over
frames. STCRF leads to temporally consistent saliency maps over frames,
contributing to the accurate detection of salient objects' boundaries and noise
reduction during detection. Our proposed method first segments an input video
into multiple scales and then computes a saliency map at each scale level using
STD features with STCRF. The final saliency map is computed by fusing saliency
maps at different scale levels. Our experiments, using publicly available
benchmark datasets, confirm that the proposed method significantly outperforms
state-of-the-art methods. We also applied our saliency computation to the video
object segmentation task, showing that our method outperforms existing video
object segmentation methods.Comment: accepted at TI
Saliency-Guided Perceptual Grouping Using Motion Cues in Region-Based Artificial Visual Attention
Region-based artificial attention constitutes a framework for bio-inspired
attentional processes on an intermediate abstraction level for the use in
computer vision and mobile robotics. Segmentation algorithms produce regions of
coherently colored pixels. These serve as proto-objects on which the
attentional processes determine image portions of relevance. A single
region---which not necessarily represents a full object---constitutes the focus
of attention. For many post-attentional tasks, however, such as identifying or
tracking objects, single segments are not sufficient. Here, we present a
saliency-guided approach that groups regions that potentially belong to the
same object based on proximity and similarity of motion. We compare our results
to object selection by thresholding saliency maps and a further
attention-guided strategy
Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection
The existing still-static deep learning based saliency researches do not
consider the weighting and highlighting of extracted features from different
layers, all features contribute equally to the final saliency decision-making.
Such methods always evenly detect all "potentially significant regions" and
unable to highlight the key salient object, resulting in detection failure of
dynamic scenes. In this paper, based on the fact that salient areas in videos
are relatively small and concentrated, we propose a \textbf{key salient object
re-augmentation method (KSORA) using top-down semantic knowledge and bottom-up
feature guidance} to improve detection accuracy in video scenes. KSORA includes
two sub-modules (WFE and KOS): WFE processes local salient feature selection
using bottom-up strategy, while KOS ranks each object in global fashion by
top-down statistical knowledge, and chooses the most critical object area for
local enhancement. The proposed KSORA can not only strengthen the saliency
value of the local key salient object but also ensure global saliency
consistency. Results on three benchmark datasets suggest that our model has the
capability of improving the detection accuracy on complex scenes. The
significant performance of KSORA, with a speed of 17FPS on modern GPUs, has
been verified by comparisons with other ten state-of-the-art algorithms.Comment: 6 figures, 10 page
SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection
Data-driven saliency detection has attracted strong interest as a result of
applying convolutional neural networks to the detection of eye fixations.
Although a number of imagebased salient object and fixation detection models
have been proposed, video fixation detection still requires more exploration.
Different from image analysis, motion and temporal information is a crucial
factor affecting human attention when viewing video sequences. Although
existing models based on local contrast and low-level features have been
extensively researched, they failed to simultaneously consider interframe
motion and temporal information across neighboring video frames, leading to
unsatisfactory performance when handling complex scenes. To this end, we
propose a novel and efficient video eye fixation detection model to improve the
saliency detection performance. By simulating the memory mechanism and visual
attention mechanism of human beings when watching a video, we propose a
step-gained fully convolutional network by combining the memory information on
the time axis with the motion information on the space axis while storing the
saliency information of the current frame. The model is obtained through
hierarchical training, which ensures the accuracy of the detection. Extensive
experiments in comparison with 11 state-of-the-art methods are carried out, and
the results show that our proposed model outperforms all 11 methods across a
number of publicly available datasets
Graph-Theoretic Spatiotemporal Context Modeling for Video Saliency Detection
As an important and challenging problem in computer vision, video saliency
detection is typically cast as a spatiotemporal context modeling problem over
consecutive frames. As a result, a key issue in video saliency detection is how
to effectively capture the intrinsical properties of atomic video structures as
well as their associated contextual interactions along the spatial and temporal
dimensions. Motivated by this observation, we propose a graph-theoretic video
saliency detection approach based on adaptive video structure discovery, which
is carried out within a spatiotemporal atomic graph. Through graph-based
manifold propagation, the proposed approach is capable of effectively modeling
the semantically contextual interactions among atomic video structures for
saliency detection while preserving spatial smoothness and temporal
consistency. Experiments demonstrate the effectiveness of the proposed approach
over several benchmark datasets.Comment: ICIP 201
Computational models of attention
This chapter reviews recent computational models of visual attention. We
begin with models for the bottom-up or stimulus-driven guidance of attention to
salient visual items, which we examine in seven different broad categories. We
then examine more complex models which address the top-down or goal-oriented
guidance of attention towards items that are more relevant to the task at hand
- …