51 research outputs found
Learning Uncertain Convolutional Features for Accurate Saliency Detection
Deep convolutional neural networks (CNNs) have delivered superior performance
in many computer vision tasks. In this paper, we propose a novel deep fully
convolutional network model for accurate salient object detection. The key
contribution of this work is to learn deep uncertain convolutional features
(UCF), which encourage the robustness and accuracy of saliency detection. We
achieve this via introducing a reformulated dropout (R-dropout) after specific
convolutional layers to construct an uncertain ensemble of internal feature
units. In addition, we propose an effective hybrid upsampling method to reduce
the checkerboard artifacts of deconvolution operators in our decoder network.
The proposed methods can also be applied to other deep convolutional networks.
Compared with existing saliency detection methods, the proposed UCF model is
able to incorporate uncertainties for more accurate object boundary inference.
Extensive experiments demonstrate that our proposed saliency model performs
favorably against state-of-the-art approaches. The uncertain feature learning
mechanism as well as the upsampling method can significantly improve
performance on other pixel-wise vision tasks.Comment: Accepted as a poster in ICCV 2017,including 10 pages, 7 figures and 3
table
Boundary-guided Feature Aggregation Network for Salient Object Detection
Fully convolutional networks (FCN) has significantly improved the performance
of many pixel-labeling tasks, such as semantic segmentation and depth
estimation. However, it still remains non-trivial to thoroughly utilize the
multi-level convolutional feature maps and boundary information for salient
object detection. In this paper, we propose a novel FCN framework to integrate
multi-level convolutional features recurrently with the guidance of object
boundary information. First, a deep convolutional network is used to extract
multi-level feature maps and separately aggregate them into multiple
resolutions, which can be used to generate coarse saliency maps. Meanwhile,
another boundary information extraction branch is proposed to generate boundary
features. Finally, an attention-based feature fusion module is designed to fuse
boundary information into salient regions to achieve accurate boundary
inference and semantic enhancement. The final saliency maps are the combination
of the predicted boundary maps and integrated saliency maps, which are more
closer to the ground truths. Experiments and analysis on four large-scale
benchmarks verify that our framework achieves new state-of-the-art results.Comment: To appear in Signal Processing Letters (SPL), 5 pages, 5 figures and
3 table
Ro-SOS: Metric Expression Network (MEnet) for Robust Salient Object Segmentation
Although deep CNNs have brought significant improvement to image saliency
detection, most CNN based models are sensitive to distortion such as
compression and noise. In this paper, we propose an end-to-end generic salient
object segmentation model called Metric Expression Network (MEnet) to deal with
saliency detection with the tolerance of distortion. Within MEnet, a new
topological metric space is constructed, whose implicit metric is determined by
the deep network. As a result, we manage to group all the pixels in the
observed image semantically within this latent space into two regions: a
salient region and a non-salient region. With this architecture, all feature
extractions are carried out at the pixel level, enabling fine granularity of
output boundaries of the salient objects. What's more, we try to give a general
analysis for the noise robustness of the network in the sense of Lipschitz and
Jacobian literature. Experiments demonstrate that robust salient maps
facilitating object segmentation can be generated by the proposed metric. Tests
on several public benchmarks show that MEnet has achieved desirable
performance. Furthermore, by direct computation and measuring the robustness,
the proposed method outperforms previous CNN-based methods on distorted inputs.Comment: This version: 11 pages (12 with reference), 12 figures, 5 table;
Version 1: 7 pages,7 figures, 4 tables; The paper for version 1 has been
accepted by International Joint Conference on Artificial Intelligence
(IJCAI),201
Enhancing Salient Object Segmentation Through Attention
Segmenting salient objects in an image is an important vision task with
ubiquitous applications. The problem becomes more challenging in the presence
of a cluttered and textured background, low resolution and/or low contrast
images. Even though existing algorithms perform well in segmenting most of the
object(s) of interest, they often end up segmenting false positives due to
resembling salient objects in the background. In this work, we tackle this
problem by iteratively attending to image patches in a recurrent fashion and
subsequently enhancing the predicted segmentation mask. Saliency features are
estimated independently for every image patch, which are further combined using
an aggregation strategy based on a Convolutional Gated Recurrent Unit (ConvGRU)
network. The proposed approach works in an end-to-end manner, removing
background noise and false positives incrementally. Through extensive
evaluation on various benchmark datasets, we show superior performance to the
existing approaches without any post-processing.Comment: CVPRW - Deep Vision 201
Self-Attention Recurrent Network for Saliency Detection
Feature maps in deep neural network generally contain different semantics.
Existing methods often omit their characteristics that may lead to sub-optimal
results. In this paper, we propose a novel end-to-end deep saliency network
which could effectively utilize multi-scale feature maps according to their
characteristics. Shallow layers often contain more local information, and deep
layers have advantages in global semantics. Therefore, the network generates
elaborate saliency maps by enhancing local and global information of feature
maps in different layers. On one hand, local information of shallow layers is
enhanced by a recurrent structure which shared convolution kernel at different
time steps. On the other hand, global information of deep layers is utilized by
a self-attention module, which generates different attention weights for
salient objects and backgrounds thus achieve better performance. Experimental
results on four widely used datasets demonstrate that our method has advantages
in performance over existing algorithms
Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection
The existing still-static deep learning based saliency researches do not
consider the weighting and highlighting of extracted features from different
layers, all features contribute equally to the final saliency decision-making.
Such methods always evenly detect all "potentially significant regions" and
unable to highlight the key salient object, resulting in detection failure of
dynamic scenes. In this paper, based on the fact that salient areas in videos
are relatively small and concentrated, we propose a \textbf{key salient object
re-augmentation method (KSORA) using top-down semantic knowledge and bottom-up
feature guidance} to improve detection accuracy in video scenes. KSORA includes
two sub-modules (WFE and KOS): WFE processes local salient feature selection
using bottom-up strategy, while KOS ranks each object in global fashion by
top-down statistical knowledge, and chooses the most critical object area for
local enhancement. The proposed KSORA can not only strengthen the saliency
value of the local key salient object but also ensure global saliency
consistency. Results on three benchmark datasets suggest that our model has the
capability of improving the detection accuracy on complex scenes. The
significant performance of KSORA, with a speed of 17FPS on modern GPUs, has
been verified by comparisons with other ten state-of-the-art algorithms.Comment: 6 figures, 10 page
Deep Reasoning with Multi-Scale Context for Salient Object Detection
To detect salient objects accurately, existing methods usually design complex
backbone network architectures to learn and fuse powerful features. However,
the saliency inference module that performs saliency prediction from the fused
features receives much less attention on its architecture design and typically
adopts only a few fully convolutional layers. In this paper, we find the
limited capacity of the saliency inference module indeed makes a fundamental
performance bottleneck, and enhancing its capacity is critical for obtaining
better saliency prediction. Correspondingly, we propose a deep yet light-weight
saliency inference module that adopts a multi-dilated depth-wise convolution
architecture. Such a deep inference module, though with simple architecture,
can directly perform reasoning about salient objects from the multi-scale
convolutional features fast, and give superior salient object detection
performance with less computational cost. To our best knowledge, we are the
first to reveal the importance of the inference module for salient object
detection, and present a novel architecture design with attractive efficiency
and accuracy. Extensive experimental evaluations demonstrate that our simple
framework performs favorably compared with the state-of-the-art methods with
complex backbone design.Comment: 10 pages, 8 figures, 3 tabl
Reverse Attention for Salient Object Detection
Benefit from the quick development of deep learning techniques, salient
object detection has achieved remarkable progresses recently. However, there
still exists following two major challenges that hinder its application in
embedded devices, low resolution output and heavy model weight. To this end,
this paper presents an accurate yet compact deep network for efficient salient
object detection. More specifically, given a coarse saliency prediction in the
deepest layer, we first employ residual learning to learn side-output residual
features for saliency refinement, which can be achieved with very limited
convolutional parameters while keep accuracy. Secondly, we further propose
reverse attention to guide such side-output residual learning in a top-down
manner. By erasing the current predicted salient regions from side-output
features, the network can eventually explore the missing object parts and
details which results in high resolution and accuracy. Experiments on six
benchmark datasets demonstrate that the proposed approach compares favorably
against state-of-the-art methods, and with advantages in terms of simplicity,
efficiency (45 FPS) and model size (81 MB).Comment: ECCV 201
Saliency-Guided Attention Network for Image-Sentence Matching
This paper studies the task of matching image and sentence, where learning
appropriate representations across the multi-modal data appears to be the main
challenge. Unlike previous approaches that predominantly deploy symmetrical
architecture to represent both modalities, we propose Saliency-guided Attention
Network (SAN) that asymmetrically employs visual and textual attention modules
to learn the fine-grained correlation intertwined between vision and language.
The proposed SAN mainly includes three components: saliency detector,
Saliency-weighted Visual Attention (SVA) module, and Saliency-guided Textual
Attention (STA) module. Concretely, the saliency detector provides the visual
saliency information as the guidance for the two attention modules. SVA is
designed to leverage the advantage of the saliency information to improve
discrimination of visual representations. By fusing the visual information from
SVA and textual information as a multi-modal guidance, STA learns
discriminative textual representations that are highly sensitive to visual
clues. Extensive experiments demonstrate SAN can substantially improve the
state-of-the-art results on the benchmark Flickr30K and MSCOCO datasets by a
large margin.Comment: 10 pages, 5 figure
Selectivity or Invariance: Boundary-aware Salient Object Detection
Typically, a salient object detection (SOD) model faces opposite requirements
in processing object interiors and boundaries. The features of interiors should
be invariant to strong appearance change so as to pop-out the salient object as
a whole, while the features of boundaries should be selective to slight
appearance change to distinguish salient objects and background. To address
this selectivity-invariance dilemma, we propose a novel boundary-aware network
with successive dilation for image-based SOD. In this network, the feature
selectivity at boundaries is enhanced by incorporating a boundary localization
stream, while the feature invariance at interiors is guaranteed with a complex
interior perception stream. Moreover, a transition compensation stream is
adopted to amend the probable failures in transitional regions between
interiors and boundaries. In particular, an integrated successive dilation
module is proposed to enhance the feature invariance at interiors and
transitional regions. Extensive experiments on six datasets show that the
proposed approach outperforms 16 state-of-the-art methods
- …