43,739 research outputs found
SEMPART: Self-supervised Multi-resolution Partitioning of Image Semantics
Accurately determining salient regions of an image is challenging when
labeled data is scarce. DINO-based self-supervised approaches have recently
leveraged meaningful image semantics captured by patch-wise features for
locating foreground objects. Recent methods have also incorporated intuitive
priors and demonstrated value in unsupervised methods for object partitioning.
In this paper, we propose SEMPART, which jointly infers coarse and fine
bi-partitions over an image's DINO-based semantic graph. Furthermore, SEMPART
preserves fine boundary details using graph-driven regularization and
successfully distills the coarse mask semantics into the fine mask. Our salient
object detection and single object localization findings suggest that SEMPART
produces high-quality masks rapidly without additional post-processing and
benefits from co-optimizing the coarse and fine branches
RRNet: Relational Reasoning Network with Parallel Multi-scale Attention for Salient Object Detection in Optical Remote Sensing Images
Salient object detection (SOD) for optical remote sensing images (RSIs) aims
at locating and extracting visually distinctive objects/regions from the
optical RSIs. Despite some saliency models were proposed to solve the intrinsic
problem of optical RSIs (such as complex background and scale-variant objects),
the accuracy and completeness are still unsatisfactory. To this end, we propose
a relational reasoning network with parallel multi-scale attention for SOD in
optical RSIs in this paper. The relational reasoning module that integrates the
spatial and the channel dimensions is designed to infer the semantic
relationship by utilizing high-level encoder features, thereby promoting the
generation of more complete detection results. The parallel multi-scale
attention module is proposed to effectively restore the detail information and
address the scale variation of salient objects by using the low-level features
refined by multi-scale attention. Extensive experiments on two datasets
demonstrate that our proposed RRNet outperforms the existing state-of-the-art
SOD competitors both qualitatively and quantitatively.Comment: 11 pages, 9 figures, Accepted by IEEE Transactions on Geoscience and
Remote Sensing 2021, project: https://rmcong.github.io/proj_RRNet.htm
Towards the Success Rate of One: Real-time Unconstrained Salient Object Detection
In this work, we propose an efficient and effective approach for
unconstrained salient object detection in images using deep convolutional
neural networks. Instead of generating thousands of candidate bounding boxes
and refining them, our network directly learns to generate the saliency map
containing the exact number of salient objects. During training, we convert the
ground-truth rectangular boxes to Gaussian distributions that better capture
the ROI regarding individual salient objects. During inference, the network
predicts Gaussian distributions centered at salient objects with an appropriate
covariance, from which bounding boxes are easily inferred. Notably, our network
performs saliency map prediction without pixel-level annotations, salient
object detection without object proposals, and salient object subitizing
simultaneously, all in a single pass within a unified framework. Extensive
experiments show that our approach outperforms existing methods on various
datasets by a large margin, and achieves more than 100 fps with VGG16 network
on a single GPU during inference
Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection
Top-down saliency models produce a probability map that peaks at target
locations specified by a task/goal such as object detection. They are usually
trained in a fully supervised setting involving pixel-level annotations of
objects. We propose a weakly supervised top-down saliency framework using only
binary labels that indicate the presence/absence of an object in an image.
First, the probabilistic contribution of each image region to the confidence of
a CNN-based image classifier is computed through a backtracking strategy to
produce top-down saliency. From a set of saliency maps of an image produced by
fast bottom-up saliency approaches, we select the best saliency map suitable
for the top-down task. The selected bottom-up saliency map is combined with the
top-down saliency map. Features having high combined saliency are used to train
a linear SVM classifier to estimate feature saliency. This is integrated with
combined saliency and further refined through a multi-scale
superpixel-averaging of saliency map. We evaluate the performance of the
proposed weakly supervised topdown saliency and achieve comparable performance
with fully supervised approaches. Experiments are carried out on seven
challenging datasets and quantitative results are compared with 40 closely
related approaches across 4 different applications.Comment: 14 pages, 7 figure
- …