17 research outputs found
MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection
Salient object detection is a fundamental problem and has been received a
great deal of attentions in computer vision. Recently deep learning model
became a powerful tool for image feature extraction. In this paper, we propose
a multi-scale deep neural network (MSDNN) for salient object detection. The
proposed model first extracts global high-level features and context
information over the whole source image with recurrent convolutional neural
network (RCNN). Then several stacked deconvolutional layers are adopted to get
the multi-scale feature representation and obtain a series of saliency maps.
Finally, we investigate a fusion convolution module (FCM) to build a final
pixel level saliency map. The proposed model is extensively evaluated on four
salient object detection benchmark datasets. Results show that our deep model
significantly outperforms other 12 state-of-the-art approaches.Comment: 10 pages, 12 figure
Video Smoke Detection Based on Deep Saliency Network
Video smoke detection is a promising fire detection method especially in open
or large spaces and outdoor environments. Traditional video smoke detection
methods usually consist of candidate region extraction and classification, but
lack powerful characterization for smoke. In this paper, we propose a novel
video smoke detection method based on deep saliency network. Visual saliency
detection aims to highlight the most important object regions in an image. The
pixel-level and object-level salient convolutional neural networks are combined
to extract the informative smoke saliency map. An end-to-end framework for
salient smoke detection and existence prediction of smoke is proposed for
application in video smoke detection. The deep feature map is combined with the
saliency map to predict the existence of smoke in an image. Initial and
augmented dataset are built to measure the performance of frameworks with
different design strategies. Qualitative and quantitative analysis at
frame-level and pixel-level demonstrate the excellent performance of the
ultimate framework.Comment: 21 pages, 12 figure
Reverse Attention for Salient Object Detection
Benefit from the quick development of deep learning techniques, salient
object detection has achieved remarkable progresses recently. However, there
still exists following two major challenges that hinder its application in
embedded devices, low resolution output and heavy model weight. To this end,
this paper presents an accurate yet compact deep network for efficient salient
object detection. More specifically, given a coarse saliency prediction in the
deepest layer, we first employ residual learning to learn side-output residual
features for saliency refinement, which can be achieved with very limited
convolutional parameters while keep accuracy. Secondly, we further propose
reverse attention to guide such side-output residual learning in a top-down
manner. By erasing the current predicted salient regions from side-output
features, the network can eventually explore the missing object parts and
details which results in high resolution and accuracy. Experiments on six
benchmark datasets demonstrate that the proposed approach compares favorably
against state-of-the-art methods, and with advantages in terms of simplicity,
efficiency (45 FPS) and model size (81 MB).Comment: ECCV 201
Top-Down Saliency Detection Driven by Visual Classification
This paper presents an approach for top-down saliency detection guided by
visual classification tasks. We first learn how to compute visual saliency when
a specific visual task has to be accomplished, as opposed to most
state-of-the-art methods which assess saliency merely through bottom-up
principles. Afterwards, we investigate if and to what extent visual saliency
can support visual classification in nontrivial cases. To achieve this, we
propose SalClassNet, a CNN framework consisting of two networks jointly
trained: a) the first one computing top-down saliency maps from input images,
and b) the second one exploiting the computed saliency maps for visual
classification. To test our approach, we collected a dataset of eye-gaze maps,
using a Tobii T60 eye tracker, by asking several subjects to look at images
from the Stanford Dogs dataset, with the objective of distinguishing dog
breeds. Performance analysis on our dataset and other saliency bench-marking
datasets, such as POET, showed that SalClassNet out-performs state-of-the-art
saliency detectors, such as SalNet and SALICON. Finally, we analyzed the
performance of SalClassNet in a fine-grained recognition task and found out
that it generalizes better than existing visual classifiers. The achieved
results, thus, demonstrate that 1) conditioning saliency detectors with object
classes reaches state-of-the-art performance, and 2) providing explicitly
top-down saliency maps to visual classifiers enhances classification accuracy
Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos
Automatic saliency prediction in 360{\deg} videos is critical for viewpoint
guidance applications (e.g., Facebook 360 Guide). We propose a spatial-temporal
network which is (1) weakly-supervised trained and (2) tailor-made for
360{\deg} viewing sphere. Note that most existing methods are less scalable
since they rely on annotated saliency map for training. Most importantly, they
convert 360{\deg} sphere to 2D images (e.g., a single equirectangular image or
multiple separate Normal Field-of-View (NFoV) images) which introduces
distortion and image boundaries. In contrast, we propose a simple and effective
Cube Padding (CP) technique as follows. Firstly, we render the 360{\deg} view
on six faces of a cube using perspective projection. Thus, it introduces very
little distortion. Then, we concatenate all six faces while utilizing the
connectivity between faces on the cube for image padding (i.e., Cube Padding)
in convolution, pooling, convolutional LSTM layers. In this way, CP introduces
no image boundary while being applicable to almost all Convolutional Neural
Network (CNN) structures. To evaluate our method, we propose Wild-360, a new
360{\deg} video saliency dataset, containing challenging videos with saliency
heatmap annotations. In experiments, our method outperforms baseline methods in
both speed and quality.Comment: CVPR 201
Masking Salient Object Detection, a Mask Region-based Convolutional Neural Network Analysis for Segmentation of Salient Objects
In this paper, we propose a broad comparison between Fully Convolutional
Networks (FCNs) and Mask Region-based Convolutional Neural Networks
(Mask-RCNNs) applied in the Salient Object Detection (SOD) context. Studies in
the SOD literature usually explore architectures based in FCNs to detect
salient regions and objects in visual scenes. However, besides the promising
results achieved, FCNs showed issues in some challenging scenarios. Fairly
recently studies in the SOD literature proposed the use of a Mask-RCNN approach
to overcome such issues. However, there is no extensive comparison between the
two networks in the SOD literature endorsing the effectiveness of Mask-RCNNs
over FCN when segmenting salient objects. Aiming to effectively show the
superiority of Mask-RCNNs over FCNs in the SOD context, we compare two
variations of Mask-RCNNs with two variations of FCNs in eight datasets widely
used in the literature and in four metrics. Our findings show that in this
context Mask-RCNNs achieved an improvement on the F-measure up to 47% over
FCNs.Comment: 6 pages, 10 figures, Accepted for presentation at the Conference on
SBR 2019 7th Brazilian Robotics Symposium/IEEE LARS 2019 16th Latin American
Robotics Symposiu
Deep 360 Pilot: Learning a Deep Agent for Piloting through 360{\deg} Sports Video
Watching a 360{\deg} sports video requires a viewer to continuously select a
viewing angle, either through a sequence of mouse clicks or head movements. To
relieve the viewer from this "360 piloting" task, we propose "deep 360 pilot"
-- a deep learning-based agent for piloting through 360{\deg} sports videos
automatically. At each frame, the agent observes a panoramic image and has the
knowledge of previously selected viewing angles. The task of the agent is to
shift the current viewing angle (i.e. action) to the next preferred one (i.e.,
goal). We propose to directly learn an online policy of the agent from data. We
use the policy gradient technique to jointly train our pipeline: by minimizing
(1) a regression loss measuring the distance between the selected and ground
truth viewing angles, (2) a smoothness loss encouraging smooth transition in
viewing angle, and (3) maximizing an expected reward of focusing on a
foreground object. To evaluate our method, we build a new 360-Sports video
dataset consisting of five sports domains. We train domain-specific agents and
achieve the best performance on viewing angle selection accuracy and transition
smoothness compared to [51] and other baselines.Comment: 13 pages, 8 figures, To appear in CVPR 2017 as an Oral paper. The
first two authors contributed equally to this work.
https://aliensunmin.github.io/project/360video
Pyramid Feature Attention Network for Saliency detection
Saliency detection is one of the basic challenges in computer vision. How to
extract effective features is a critical point for saliency detection. Recent
methods mainly adopt integrating multi-scale convolutional features
indiscriminately. However, not all features are useful for saliency detection
and some even cause interferences. To solve this problem, we propose Pyramid
Feature Attention network to focus on effective high-level context features and
low-level spatial structural features. First, we design Context-aware Pyramid
Feature Extraction (CPFE) module for multi-scale high-level feature maps to
capture rich context features. Second, we adopt channel-wise attention (CA)
after CPFE feature maps and spatial attention (SA) after low-level feature
maps, then fuse outputs of CA & SA together. Finally, we propose an edge
preservation loss to guide network to learn more detailed information in
boundary localization. Extensive evaluations on five benchmark datasets
demonstrate that the proposed method outperforms the state-of-the-art
approaches under different evaluation metrics.Comment: Accepted by CVPR201
Superpixel-based Refinement for Object Proposal Generation
Precise segmentation of objects is an important problem in tasks like
class-agnostic object proposal generation or instance segmentation. Deep
learning-based systems usually generate segmentations of objects based on
coarse feature maps, due to the inherent downsampling in CNNs. This leads to
segmentation boundaries not adhering well to the object boundaries in the
image. To tackle this problem, we introduce a new superpixel-based refinement
approach on top of the state-of-the-art object proposal system AttentionMask.
The refinement utilizes superpixel pooling for feature extraction and a novel
superpixel classifier to determine if a high precision superpixel belongs to an
object or not. Our experiments show an improvement of up to 26.0% in terms of
average recall compared to original AttentionMask. Furthermore, qualitative and
quantitative analyses of the segmentations reveal significant improvements in
terms of boundary adherence for the proposed refinement compared to various
deep learning-based state-of-the-art object proposal generation systems.Comment: Accepted at ICPR 2020. Code is available at
https://github.com/chwilms/superpixelRefinemen
Self-explanatory Deep Salient Object Detection
Salient object detection has seen remarkable progress driven by deep learning
techniques. However, most of deep learning based salient object detection
methods are black-box in nature and lacking in interpretability. This paper
proposes the first self-explanatory saliency detection network that explicitly
exploits low- and high-level features for salient object detection. We
demonstrate that such supportive clues not only significantly enhances
performance of salient object detection but also gives better justified
detection results. More specifically, we develop a multi-stage saliency encoder
to extract multi-scale features which contain both low- and high-level saliency
context. Dense short- and long-range connections are introduced to reuse these
features iteratively. Benefiting from the direct access to low- and high-level
features, the proposed saliency encoder can not only model the object context
but also preserve the boundary. Furthermore, a self-explanatory generator is
proposed to interpret how the proposed saliency encoder or other deep saliency
models making decisions. The generator simulates the absence of interesting
features by preventing these features from contributing to the saliency
classifier and estimates the corresponding saliency prediction without these
features. A comparison function, saliency explanation, is defined to measure
the prediction changes between deep saliency models and corresponding
generator. Through visualizing the differences, we can interpret the capability
of different deep neural networks based saliency detection models and
demonstrate that our proposed model indeed uses more reasonable structure for
salient object detection. Extensive experiments on five popular benchmark
datasets and the visualized saliency explanation demonstrate that the proposed
method provides new state-of-the-art