23 research outputs found
Self-training with dual uncertainty for semi-supervised medical image segmentation
In the field of semi-supervised medical image segmentation, the shortage of
labeled data is the fundamental problem. How to effectively learn image
features from unlabeled images to improve segmentation accuracy is the main
research direction in this field. Traditional self-training methods can
partially solve the problem of insufficient labeled data by generating pseudo
labels for iterative training. However, noise generated due to the model's
uncertainty during training directly affects the segmentation results.
Therefore, we added sample-level and pixel-level uncertainty to stabilize the
training process based on the self-training framework. Specifically, we saved
several moments of the model during pre-training, and used the difference
between their predictions on unlabeled samples as the sample-level uncertainty
estimate for that sample. Then, we gradually add unlabeled samples from easy to
hard during training. At the same time, we added a decoder with different
upsampling methods to the segmentation network and used the difference between
the outputs of the two decoders as pixel-level uncertainty. In short, we
selectively retrained unlabeled samples and assigned pixel-level uncertainty to
pseudo labels to optimize the self-training process. We compared the
segmentation results of our model with five semi-supervised approaches on the
public 2017 ACDC dataset and 2018 Prostate dataset. Our proposed method
achieves better segmentation performance on both datasets under the same
settings, demonstrating its effectiveness, robustness, and potential
transferability to other medical image segmentation tasks. Keywords: Medical
image segmentation, semi-supervised learning, self-training, uncertainty
estimatio
DMT: Dynamic Mutual Training for Semi-Supervised Learning
Recent semi-supervised learning methods use pseudo supervision as core idea,
especially self-training methods that generate pseudo labels. However, pseudo
labels are unreliable. Self-training methods usually rely on single model
prediction confidence to filter low-confidence pseudo labels, thus remaining
high-confidence errors and wasting many low-confidence correct labels. In this
paper, we point out it is difficult for a model to counter its own errors.
Instead, leveraging inter-model disagreement between different models is a key
to locate pseudo label errors. With this new viewpoint, we propose mutual
training between two different models by a dynamically re-weighted loss
function, called Dynamic Mutual Training (DMT). We quantify inter-model
disagreement by comparing predictions from two different models to dynamically
re-weight loss in training, where a larger disagreement indicates a possible
error and corresponds to a lower loss value. Extensive experiments show that
DMT achieves state-of-the-art performance in both image classification and
semantic segmentation. Our codes are released at
https://github.com/voldemortX/DST-CBC .Comment: Reformatte
Cycle Self-Training for Semi-Supervised Object Detection with Distribution Consistency Reweighting
Recently, many semi-supervised object detection (SSOD) methods adopt
teacher-student framework and have achieved state-of-the-art results. However,
the teacher network is tightly coupled with the student network since the
teacher is an exponential moving average (EMA) of the student, which causes a
performance bottleneck. To address the coupling problem, we propose a Cycle
Self-Training (CST) framework for SSOD, which consists of two teachers T1 and
T2, two students S1 and S2. Based on these networks, a cycle self-training
mechanism is built, i.e.,
S1T1S2T2S1. For
ST, we also utilize the EMA weights of the students to update
the teachers. For TS, instead of providing supervision for its
own student S1(S2) directly, the teacher T1(T2) generates pseudo-labels for the
student S2(S1), which looses the coupling effect. Moreover, owing to the
property of EMA, the teacher is most likely to accumulate the biases from the
student and make the mistakes irreversible. To mitigate the problem, we also
propose a distribution consistency reweighting strategy, where pseudo-labels
are reweighted based on distribution consistency across the teachers T1 and T2.
With the strategy, the two students S2 and S1 can be trained robustly with
noisy pseudo labels to avoid confirmation biases. Extensive experiments prove
the superiority of CST by consistently improving the AP over the baseline and
outperforming state-of-the-art methods by 2.1% absolute AP improvements with
scarce labeled data.Comment: ACM Multimedia 202
Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation
Semi-supervised medical image segmentation has attracted much attention in
recent years because of the high cost of medical image annotations. In this
paper, we propose a novel Inherent Consistent Learning (ICL) method, aims to
learn robust semantic category representations through the semantic consistency
guidance of labeled and unlabeled data to help segmentation. In practice, we
introduce two external modules, namely Supervised Semantic Proxy Adaptor (SSPA)
and Unsupervised Semantic Consistent Learner (USCL) that is based on the
attention mechanism to align the semantic category representations of labeled
and unlabeled data, as well as update the global semantic representations over
the entire training set. The proposed ICL is a plug-and-play scheme for various
network architectures, and the two modules are not involved in the testing
stage. Experimental results on three public benchmarks show that the proposed
method can outperform the state-of-the-art, especially when the number of
annotated data is extremely limited. Code is available at:
https://github.com/zhuye98/ICL.git
ConsInstancy: learning instance representations for semi-supervised panoptic segmentation of concrete aggregate particles
We present a semi-supervised method for panoptic segmentation based on ConsInstancy regularisation, a novel strategy for semi-supervised learning. It leverages completely unlabelled data by enforcing consistency between predicted instance representations and semantic segmentations during training in order to improve the segmentation performance. To this end, we also propose new types of instance representations that can be predicted by one simple forward path through a fully convolutional network (FCN), delivering a convenient and simple-to-train framework for panoptic segmentation. More specifically, we propose the prediction of a three-dimensional instance orientation map as intermediate representation and two complementary distance transform maps as final representation, providing unique instance representations for a panoptic segmentation. We test our method on two challenging data sets of both, hardened and fresh concrete, the latter being proposed by the authors in this paper demonstrating the effectiveness of our approach, outperforming the results achieved by state-of-the-art methods for semi-supervised segmentation. In particular, we are able to show that by leveraging completely unlabelled data in our semi-supervised approach the achieved overall accuracy (OA) is increased by up to 5% compared to an entirely supervised training using only labelled data. Furthermore, we exceed the OA achieved by state-of-the-art semi-supervised methods by up to 1.5%
Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval
Language-guided image retrieval enables users to search for images and
interact with the retrieval system more naturally and expressively by using a
reference image and a relative caption as a query. Most existing studies mainly
focus on designing image-text composition architecture to extract
discriminative visual-linguistic relations. Despite great success, we identify
an inherent problem that obstructs the extraction of discriminative features
and considerably compromises model training: \textbf{triplet ambiguity}. This
problem stems from the annotation process wherein annotators view only one
triplet at a time. As a result, they often describe simple attributes, such as
color, while neglecting fine-grained details like location and style. This
leads to multiple false-negative candidates matching the same modification
text. We propose a novel Consensus Network (Css-Net) that self-adaptively
learns from noisy triplets to minimize the negative effects of triplet
ambiguity. Inspired by the psychological finding that groups perform better
than individuals, Css-Net comprises 1) a consensus module featuring four
distinct compositors that generate diverse fused image-text embeddings and 2) a
Kullback-Leibler divergence loss, which fosters learning among the compositors,
enabling them to reduce biases learned from noisy triplets and reach a
consensus. The decisions from four compositors are weighted during evaluation
to further achieve consensus. Comprehensive experiments on three datasets
demonstrate that Css-Net can alleviate triplet ambiguity, achieving competitive
performance on benchmarks, such as R@10 and R@50 on
FashionIQ.Comment: 11 page
Semi-Supervised Panoptic Narrative Grounding
Despite considerable progress, the advancement of Panoptic Narrative
Grounding (PNG) remains hindered by costly annotations. In this paper, we
introduce a novel Semi-Supervised Panoptic Narrative Grounding (SS-PNG)
learning scheme, capitalizing on a smaller set of labeled image-text pairs and
a larger set of unlabeled pairs to achieve competitive performance. Unlike
visual segmentation tasks, PNG involves one pixel belonging to multiple
open-ended nouns. As a result, existing multi-class based semi-supervised
segmentation frameworks cannot be directly applied to this task. To address
this challenge, we first develop a novel SS-PNG Network (SS-PNG-NW) tailored to
the SS-PNG setting. We thoroughly investigate strategies such as Burn-In and
data augmentation to determine the optimal generic configuration for the
SS-PNG-NW. Additionally, to tackle the issue of imbalanced pseudo-label
quality, we propose a Quality-Based Loss Adjustment (QLA) approach to adjust
the semi-supervised objective, resulting in an enhanced SS-PNG-NW+. Employing
our proposed QLA, we improve BCE Loss and Dice loss at pixel and mask levels,
respectively. We conduct extensive experiments on PNG datasets, with our
SS-PNG-NW+ demonstrating promising results comparable to fully-supervised
models across all data ratios. Remarkably, our SS-PNG-NW+ outperforms
fully-supervised models with only 30% and 50% supervision data, exceeding their
performance by 0.8% and 1.1% respectively. This highlights the effectiveness of
our proposed SS-PNG-NW+ in overcoming the challenges posed by limited
annotations and enhancing the applicability of PNG tasks. The source code is
available at https://github.com/nini0919/SSPNG.Comment: ACM MM 202