In this paper, we address the detection of co-occurring salient objects
(CoSOD) in an image group using frequency statistics in an unsupervised manner,
which further enable us to develop a semi-supervised method. While previous
works have mostly focused on fully supervised CoSOD, less attention has been
allocated to detecting co-salient objects when limited segmentation annotations
are available for training. Our simple yet effective unsupervised method
US-CoSOD combines the object co-occurrence frequency statistics of unsupervised
single-image semantic segmentations with salient foreground detections using
self-supervised feature learning. For the first time, we show that a large
unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to
significantly improve unsupervised CoSOD performance. Our unsupervised model is
a great pre-training initialization for our semi-supervised model SS-CoSOD,
especially when very limited labeled data is available for training. To avoid
propagating erroneous signals from predictions on unlabeled data, we propose a
confidence estimation module to guide our semi-supervised training. Extensive
experiments on three CoSOD benchmark datasets show that both of our
unsupervised and semi-supervised models outperform the corresponding
state-of-the-art models by a significant margin (e.g., on the Cosal2015
dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised
co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over
a SOTA semi-supervised CoSOD model).Comment: Accepted at IEEE WACV 202