11 research outputs found
Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement
Although extensive research has been conducted on 3D point cloud
segmentation, effectively adapting generic models to novel categories remains a
formidable challenge. This paper proposes a novel approach to improve point
cloud few-shot segmentation (PC-FSS) models. Unlike existing PC-FSS methods
that directly utilize categorical information from support prototypes to
recognize novel classes in query samples, our method identifies two critical
aspects that substantially enhance model performance by reducing contextual
gaps between support prototypes and query features. Specifically, we (1) adapt
support background prototypes to match query context while removing extraneous
cues that may obscure foreground and background in query samples, and (2)
holistically rectify support prototypes under the guidance of query features to
emulate the latter having no semantic gap to the query targets. Our proposed
designs are agnostic to the feature extractor, rendering them readily
applicable to any prototype-based methods. The experimental results on S3DIS
and ScanNet demonstrate notable practical benefits, as our approach achieves
significant improvements while still maintaining high efficiency. The code for
our approach is available at
https://github.com/AaronNZH/Boosting-Few-shot-3D-Point-Cloud-Segmentation-via-Query-Guided-EnhancementComment: Accepted to ACM MM 202
ResLT: Residual Learning for Long-tailed Recognition
Deep learning algorithms face great challenges with long-tailed data
distribution which, however, is quite a common case in real-world scenarios.
Previous methods tackle the problem from either the aspect of input space
(re-sampling classes with different frequencies) or loss space (re-weighting
classes with different weights), suffering from heavy over-fitting to tail
classes or hard optimization during training. To alleviate these issues, we
propose a more fundamental perspective for long-tailed recognition, {i.e., from
the aspect of parameter space, and aims to preserve specific capacity for
classes with low frequencies. From this perspective, the trivial solution
utilizes different branches for the head, medium, tail classes respectively,
and then sums their outputs as the final results is not feasible. Instead, we
design the effective residual fusion mechanism -- with one main branch
optimized to recognize images from all classes, another two residual branches
are gradually fused and optimized to enhance images from medium+tail classes
and tail classes respectively. Then the branches are aggregated into final
results by additive shortcuts. We test our method on several benchmarks, {i.e.,
long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist
2018. Experimental results manifest that our method achieves new
state-of-the-art for long-tailed recognition. Code will be available at
\url{https://github.com/FPNAS/ResLT}
Generalized Parametric Contrastive Learning
In this paper, we propose the Generalized Parametric Contrastive Learning
(GPaCo/PaCo) which works well on both imbalanced and balanced data. Based on
theoretical analysis, we observe that supervised contrastive loss tends to bias
high-frequency classes and thus increases the difficulty of imbalanced
learning. We introduce a set of parametric class-wise learnable centers to
rebalance from an optimization perspective. Further, we analyze our GPaCo/PaCo
loss under a balanced setting. Our analysis demonstrates that GPaCo/PaCo can
adaptively enhance the intensity of pushing samples of the same class close as
more samples are pulled together with their corresponding centers and benefit
hard example learning. Experiments on long-tailed benchmarks manifest the new
state-of-the-art for long-tailed recognition. On full ImageNet, models from
CNNs to vision transformers trained with GPaCo loss show better generalization
performance and stronger robustness compared with MAE models. Moreover, GPaCo
can be applied to the semantic segmentation task and obvious improvements are
observed on the 4 most popular benchmarks. Our code is available at
https://github.com/dvlab-research/Parametric-Contrastive-Learning.Comment: TPAMI 2023. arXiv admin note: substantial text overlap with
arXiv:2107.1202
Generalized Few-shot Semantic Segmentation
Training semantic segmentation models requires a large amount of finely
annotated data, making it hard to quickly adapt to novel classes not satisfying
this condition. Few-Shot Segmentation (FS-Seg) tackles this problem with many
constraints. In this paper, we introduce a new benchmark, called Generalized
Few-Shot Semantic Segmentation (GFS-Seg), to analyze the generalization ability
of simultaneously segmenting the novel categories with very few examples and
the base categories with sufficient examples. It is the first study showing
that previous representative state-of-the-art FS-Seg methods fall short in
GFS-Seg and the performance discrepancy mainly comes from the constrained
setting of FS-Seg. To make GFS-Seg tractable, we set up a GFS-Seg baseline that
achieves decent performance without structural change on the original model.
Then, since context is essential for semantic segmentation, we propose the
Context-Aware Prototype Learning (CAPL) that significantly improves performance
by 1) leveraging the co-occurrence prior knowledge from support samples, and 2)
dynamically enriching contextual information to the classifier, conditioned on
the content of each query image. Both two contributions are experimentally
shown to have substantial practical merit. Extensive experiments on Pascal-VOC
and COCO manifest the effectiveness of CAPL, and CAPL generalizes well to
FS-Seg by achieving competitive performance. Code will be made publicly
available
Region Refinement Network for Salient Object Detection
Albeit intensively studied, false prediction and unclear boundaries are still
major issues of salient object detection. In this paper, we propose a Region
Refinement Network (RRN), which recurrently filters redundant information and
explicitly models boundary information for saliency detection. Different from
existing refinement methods, we propose a Region Refinement Module (RRM) that
optimizes salient region prediction by incorporating supervised attention masks
in the intermediate refinement stages. The module only brings a minor increase
in model size and yet significantly reduces false predictions from the
background. To further refine boundary areas, we propose a Boundary Refinement
Loss (BRL) that adds extra supervision for better distinguishing foreground
from background. BRL is parameter free and easy to train. We further observe
that BRL helps retain the integrity in prediction by refining the boundary.
Extensive experiments on saliency detection datasets show that our refinement
module and loss bring significant improvement to the baseline and can be easily
applied to different frameworks. We also demonstrate that our proposed model
generalizes well to portrait segmentation and shadow detection tasks
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation-Extended Abstract
Few-shot semantic segmentation (FSS) aims to form class-agnostic models
segmenting unseen classes with only a handful of annotations. Previous methods
limited to the semantic feature and prototype representation suffer from coarse
segmentation granularity and train-set overfitting. In this work, we design
Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support
correlation based on the transformer architecture. The self-attention modules
are used to assist in establishing hierarchical dense features, as a means to
accomplish the cascade matching between query and support features. Moreover,
we propose a matching module to reduce train-set overfitting and introduce
correlation distillation leveraging semantic correspondence from coarse
resolution to boost fine-grained segmentation. Our method performs decently in
experiments. We achieve 50.0% mIoU on COCO dataset one-shot setting and 56.0%
on five-shot segmentation, respectively. The code will be available on the
project website. We hope our work can benefit broader industrial applications
where novel classes with limited annotations are required to be decently
identified.Comment: Accepted to CVPR 2023 VISION Workshop, Oral. The extended abstract of
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation. arXiv
admin note: substantial text overlap with arXiv:2303.1465
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions
Over the past few years, the rapid development of deep learning technologies
for computer vision has significantly improved the performance of medical image
segmentation (MedISeg). However, the diverse implementation strategies of
various models have led to an extremely complex MedISeg system, resulting in a
potential problem of unfair result comparisons. In this paper, we collect a
series of MedISeg tricks for different model implementation phases (i.e.,
pre-training model, data pre-processing, data augmentation, model
implementation, model inference, and result post-processing), and
experimentally explore the effectiveness of these tricks on consistent
baselines. With the extensive experimental results on both the representative
2D and 3D medical image datasets, we explicitly clarify the effect of these
tricks. Moreover, based on the surveyed tricks, we also open-sourced a strong
MedISeg repository, where each component has the advantage of plug-and-play. We
believe that this milestone work not only completes a comprehensive and
complementary survey of the state-of-the-art MedISeg approaches, but also
offers a practical guide for addressing the future medical image processing
challenges including but not limited to small dataset, class imbalance
learning, multi-modality learning, and domain adaptation. The code and training
weights have been released at: https://github.com/hust-linyi/seg_trick.Comment: Under submissio