3 research outputs found
Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation
Recently, CLIP-based approaches have exhibited remarkable performance on
generalization and few-shot learning tasks, fueled by the power of contrastive
language-vision pre-training. In particular, prompt tuning has emerged as an
effective strategy to adapt the pre-trained language-vision models to
downstream tasks by employing task-related textual tokens. Motivated by this
progress, in this work we question whether other fundamental problems, such as
weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning.
Our findings reveal two interesting observations that shed light on the impact
of prompt tuning on WSSS. First, modifying only the class token of the text
prompt results in a greater impact on the Class Activation Map (CAM), compared
to arguably more complex strategies that optimize the context. And second, the
class token associated with the image ground truth does not necessarily
correspond to the category that yields the best CAM. Motivated by these
observations, we introduce a novel approach based on a PrOmpt cLass lEarning
(POLE) strategy. Through extensive experiments we demonstrate that our simple,
yet efficient approach achieves SOTA performance in a well-known WSSS
benchmark. These results highlight not only the benefits of language-vision
models in WSSS but also the potential of prompt learning for this problem. The
code is available at https://github.com/rB080/WSS_POLE.Comment: Under revie
MEDICAL MACHINE INTELLIGENCE: DATA-EFFICIENCY AND KNOWLEDGE-AWARENESS
Traditional clinician diagnosis requires massive manual labor from experienced doctors, which is time-consuming and costly. Computer-aided systems are therefore proposed to reduce doctors’ efforts by using machines to automatically make diagnosis and treatment recommendations. The recent success in deep learning has largely advanced the field of computer-aided diagnosis by offering an avenue to deliver automated medical image analysis. Despite such progress, there remain several challenges towards medical machine intelligence, such as unsatisfactory performance regarding challenging small targets, insufficient training data, high annotation cost, the lack of domain-specific knowledge, etc. These challenges cultivate the need for developing data-efficient and knowledge-aware deep learning techniques which can generalize to different medical tasks without requiring intensive manual labeling efforts, and incorporate domain-specific knowledge in the learning process.
In this thesis, we rethink the current progress of deep learning in medical image analysis, with a focus on the aforementioned challenges, and present different data-efficient and knowledge-aware deep learning approaches to address them accordingly. Firstly, we introduce coarse-to-fine mechanisms which use the prediction from the first (coarse) stage to shrink the input region for the second (fine) stage, to enhance the model performance especially for segmenting small challenging structures, such as the pancreas which occupies only a very small fraction (e.g., < 0.5%) of the entire CT volume. The method achieved the state-of-the-art result on the NIH pancreas segmentation dataset. Further extensions also demonstrated effectiveness for segmenting neoplasms such as pancreatic cysts or multiple organs.
Secondly, we present a semi-supervised learning framework for medical image segmentation by leveraging both limited labeled data and abundant unlabeled data. Our learning method encourages the segmentation output to be consistent for the same input under different viewing conditions. More importantly, the outputs from different viewing directions are fused altogether to improve the quality of the target, which further enhances the overall performance. The comparison with fully-supervised methods on multi-organ segmentation confirms the effectiveness of this method.
Thirdly, we discuss how to incorporate knowledge priors for multi-organ segmentation. Noticing that the abdominal organ sizes exhibit similar distributions across different cohorts, we propose to explicitly incorporate anatomical priors on abdominal organ sizes, guiding the training process with domain-specific knowledge. The approach achieves 84.97% on the MICCAI 2015 challenge “Multi-Atlas Labeling Beyond the Cranial Vault”, which significantly outperforms previous state-of-the-art even using fewer annotations.
Lastly, by rethinking how radiologists interpret medical images, we identify one limitation for existing deep-learning-based works on detecting pancreatic ductal adenocarcinoma is the lack of knowledge integration from multi-phase images. Thereby, we introduce a dual-path network where different paths are connected for multi-phase information exchange, and an additional loss is added for removing view divergence. By effectively incorporating multi-phase information, the presented method shows superior performance than prior arts on this matter