29 research outputs found
Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation
Unsupervised Domain Adaptation (UDA) aims to adapt the model trained on the
labeled source domain to an unlabeled target domain. In this paper, we present
Prototypical Contrast Adaptation (ProCA), a simple and efficient contrastive
learning method for unsupervised domain adaptive semantic segmentation.
Previous domain adaptation methods merely consider the alignment of the
intra-class representational distributions across various domains, while the
inter-class structural relationship is insufficiently explored, resulting in
the aligned representations on the target domain might not be as easily
discriminated as done on the source domain anymore. Instead, ProCA incorporates
inter-class information into class-wise prototypes, and adopts the
class-centered distribution alignment for adaptation. By considering the same
class prototypes as positives and other class prototypes as negatives to
achieve class-centered distribution alignment, ProCA achieves state-of-the-art
performance on classical domain adaptation tasks, {\em i.e., GTA5
Cityscapes \text{and} SYNTHIA Cityscapes}. Code is available at
\href{https://github.com/jiangzhengkai/ProCA}{ProCA
You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction
Challenging illumination conditions (low-light, under-exposure and
over-exposure) in the real world not only cast an unpleasant visual appearance
but also taint the computer vision tasks. After camera captures the raw-RGB
data, it renders standard sRGB images with image signal processor (ISP). By
decomposing ISP pipeline into local and global image components, we propose a
lightweight fast Illumination Adaptive Transformer (IAT) to restore the normal
lit sRGB image from either low-light or under/over-exposure conditions.
Specifically, IAT uses attention queries to represent and adjust the
ISP-related parameters such as colour correction, gamma correction. With only
~90k parameters and ~0.004s processing speed, our IAT consistently achieves
superior performance over SOTA on the current benchmark low-light enhancement
and exposure correction datasets. Competitive experimental performance also
demonstrates that our IAT significantly enhances object detection and semantic
segmentation tasks under various light conditions. Training code and pretrained
model is available at
https://github.com/cuiziteng/Illumination-Adaptive-Transformer.Comment: 23 page
Dynamic fusion with intra-and inter-modality attention flow for visual question answering
Learning effective fusion of multi-modality features is at the heart of
visual question answering. We propose a novel method of dynamically fusing
multi-modal features with intra- and inter-modality information flow, which
alternatively pass dynamic information between and across the visual and
language modalities. It can robustly capture the high-level interactions
between language and vision domains, thus significantly improves the
performance of visual question answering. We also show that the proposed
dynamic intra-modality attention flow conditioned on the other modality can
dynamically modulate the intra-modality attention of the target modality, which
is vital for multimodality feature fusion. Experimental evaluations on the VQA
2.0 dataset show that the proposed method achieves state-of-the-art VQA
performance. Extensive ablation studies are carried out for the comprehensive
analysis of the proposed method.Comment: CVPR 2019 ORA
Rethinking Mobile Block for Efficient Attention-based Models
This paper focuses on developing modern, efficient, lightweight models for
dense predictions while trading off parameters, FLOPs, and performance.
Inverted Residual Block (IRB) serves as the infrastructure for lightweight
CNNs, but no counterpart has been recognized by attention-based studies. This
work rethinks lightweight infrastructure from efficient IRB and effective
components of Transformer from a unified perspective, extending CNN-based IRB
to attention-based models and abstracting a one-residual Meta Mobile Block
(MMB) for lightweight model design. Following simple but effective design
criterion, we deduce a modern Inverted Residual Mobile Block (iRMB) and build a
ResNet-like Efficient MOdel (EMO) with only iRMB for down-stream tasks.
Extensive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks
demonstrate the superiority of our EMO over state-of-the-art methods, e.g.,
EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass equal-order
CNN-/Attention-based models, while trading-off the parameter, efficiency, and
accuracy well: running 2.8-4.0x faster than EdgeNeXt on iPhone14