646 research outputs found
Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation
Weakly supervised semantic segmentation is receiving great attention due to
its low human annotation cost. In this paper, we aim to tackle bounding box
supervised semantic segmentation, i.e., training accurate semantic segmentation
models using bounding box annotations as supervision. To this end, we propose
Affinity Attention Graph Neural Network (GNN). Following previous
practices, we first generate pseudo semantic-aware seeds, which are then formed
into semantic graphs based on our newly proposed affinity Convolutional Neural
Network (CNN). Then the built graphs are input to our GNN, in which an
affinity attention layer is designed to acquire the short- and long- distance
information from soft graph edges to accurately propagate semantic labels from
the confident seeds to the unlabeled pixels. However, to guarantee the
precision of the seeds, we only adopt a limited number of confident pixel seed
labels for GNN, which may lead to insufficient supervision for training.
To alleviate this issue, we further introduce a new loss function and a
consistency-checking mechanism to leverage the bounding box constraint, so that
more reliable guidance can be included for the model optimization. Experiments
show that our approach achieves new state-of-the-art performances on Pascal VOC
2012 datasets (val: 76.5\%, test: 75.2\%). More importantly, our approach can
be readily applied to bounding box supervised instance segmentation task or
other weakly supervised semantic segmentation tasks, with state-of-the-art or
comparable performance among almot all weakly supervised tasks on PASCAL VOC or
COCO dataset. Our source code will be available at
https://github.com/zbf1991/A2GNN.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence (TAPMI 2021
Weakly supervised deep semantic segmentation using CNN and ELM with semantic candidate regions.
The task of semantic segmentation is to obtain strong pixel-level annotations for each pixel in the image. For fully supervised semantic segmentation, the task is achieved by a segmentation model trained using pixel-level annotations. However, the pixel-level annotation process is very expensive and time-consuming. To reduce the cost, the paper proposes a semantic candidate regions trained extreme learning machine (ELM) method with image-level labels to achieve pixel-level labels mapping. In this work, the paper casts the pixel mapping problem into a candidate region semantic inference problem. Specifically, after segmenting each image into a set of superpixels, superpixels are automatically combined to achieve segmentation of candidate region according to the number of image-level labels. Semantic inference of candidate regions is realized based on the relationship and neighborhood rough set associated with semantic labels. Finally, the paper trains the ELM using the candidate regions of the inferred labels to classify the test candidate regions. The experiment is verified on the MSRC dataset and PASCAL VOC 2012, which are popularly used in semantic segmentation. The experimental results show that the proposed method outperforms several state-of-the-art approaches for deep semantic segmentation
Weakly supervised segment annotation via expectation kernel density estimation
Since the labelling for the positive images/videos is ambiguous in weakly
supervised segment annotation, negative mining based methods that only use the
intra-class information emerge. In these methods, negative instances are
utilized to penalize unknown instances to rank their likelihood of being an
object, which can be considered as a voting in terms of similarity. However,
these methods 1) ignore the information contained in positive bags, 2) only
rank the likelihood but cannot generate an explicit decision function. In this
paper, we propose a voting scheme involving not only the definite negative
instances but also the ambiguous positive instances to make use of the extra
useful information in the weakly labelled positive bags. In the scheme, each
instance votes for its label with a magnitude arising from the similarity, and
the ambiguous positive instances are assigned soft labels that are iteratively
updated during the voting. It overcomes the limitations of voting using only
the negative bags. We also propose an expectation kernel density estimation
(eKDE) algorithm to gain further insight into the voting mechanism.
Experimental results demonstrate the superiority of our scheme beyond the
baselines.Comment: 9 pages, 2 figure
Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation
In this work, we leverage estimated depth to boost self-supervised
contrastive learning for segmentation of urban scenes, where unlabeled videos
are readily available for training self-supervised depth estimation. We argue
that the semantics of a coherent group of pixels in 3D space is self-contained
and invariant to the contexts in which they appear. We group coherent,
semantically related pixels into coherent depth regions given their estimated
depth and use copy-paste to synthetically vary their contexts. In this way,
cross-context correspondences are built in contrastive learning and a
context-invariant representation is learned. For unsupervised semantic
segmentation of urban scenes, our method surpasses the previous
state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI.
For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive
with existing models, yet, we do not need to pre-train on ImageNet or COCO, and
we are also more computationally efficient. Our code is available on
https://github.com/LeungTsang/CPCDRComment: BMVC 2022 Best Student Paper Award(Honourable Mention
- …