In recent years, several Weakly Supervised Semantic Segmentation (WS3)
methods have been proposed that use class activation maps (CAMs) generated by a
classifier to produce pseudo-ground truths for training segmentation models.
While CAMs are good at highlighting discriminative regions (DR) of an image,
they are known to disregard regions of the object that do not contribute to the
classifier's prediction, termed non-discriminative regions (NDR). In contrast,
attribution methods such as saliency maps provide an alternative approach for
assigning a score to every pixel based on its contribution to the
classification prediction. This paper provides a comprehensive comparison
between saliencies and CAMs for WS3. Our study includes multiple perspectives
on understanding their similarities and dissimilarities. Moreover, we provide
new evaluation metrics that perform a comprehensive assessment of WS3
performance of alternative methods w.r.t. CAMs. We demonstrate the
effectiveness of saliencies in addressing the limitation of CAMs through our
empirical studies on benchmark datasets. Furthermore, we propose random
cropping as a stochastic aggregation technique that improves the performance of
saliency, making it a strong alternative to CAM for WS3.Comment: 24 pages, 13 figures, 4 table