4 research outputs found
Unsupervised and semi-supervised co-salient object detection via segmentation frequency statistics
In this paper, we address the detection of co-occurring salient objects
(CoSOD) in an image group using frequency statistics in an unsupervised manner,
which further enable us to develop a semi-supervised method. While previous
works have mostly focused on fully supervised CoSOD, less attention has been
allocated to detecting co-salient objects when limited segmentation annotations
are available for training. Our simple yet effective unsupervised method
US-CoSOD combines the object co-occurrence frequency statistics of unsupervised
single-image semantic segmentations with salient foreground detections using
self-supervised feature learning. For the first time, we show that a large
unlabeled dataset e.g. ImageNet-1k can be effectively leveraged to
significantly improve unsupervised CoSOD performance. Our unsupervised model is
a great pre-training initialization for our semi-supervised model SS-CoSOD,
especially when very limited labeled data is available for training. To avoid
propagating erroneous signals from predictions on unlabeled data, we propose a
confidence estimation module to guide our semi-supervised training. Extensive
experiments on three CoSOD benchmark datasets show that both of our
unsupervised and semi-supervised models outperform the corresponding
state-of-the-art models by a significant margin (e.g., on the Cosal2015
dataset, our US-CoSOD model has an 8.8% F-measure gain over a SOTA unsupervised
co-segmentation model and our SS-CoSOD model has an 11.81% F-measure gain over
a SOTA semi-supervised CoSOD model).Comment: Accepted at IEEE WACV 202
Recommended from our members
Modeling joint attention from egocentric vision
Numerous studies in cognitive development have provided converging evidence that Joint Attention (JA) is crucial for children to learn about the world together with their parents. However, a closer look reveals that, in the literature, JA has been operationally defined in different ways. For example, some definitions require explicit signals of “awareness” of being in JA—such as gaze following, while others simply define JA as shared gaze to an object or activity. But what if “awareness” is possible without gaze following? The present study examines egocentric images collected via head-mounted eye-trackers during parent-child toy play. A Convolutional Neural Network model was used to process and learn to classify raw egocentric images as JA vs not JA. We demonstrate individual child and parent egocentric views can be classified as being part of a JA bout at above chance levels. This provides new evidence that an individual can be “aware” they are in JA based solely on the in-the-moment visual information. Moreover, both models trained on child views and those trained on parent views leveraged the visual properties associated with visual object holding to improve classification accuracy—suggesting a critical role for object handling in not only establishing JA, as shown in previous research, but also in inferring the social partner’s attentional state during JA