Search CORE

296 research outputs found

Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

Author: Sun Qianru
Yue Zhongqi
Zhang Hanwang
Publication venue
Publication date: 22/09/2023
Field of study

Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e.g., class identity) and domain-specific features (e.g., environment) that does not generalize to the target domain. Unfortunately, even enriched with additional unsupervised target domains, existing Unsupervised DA (UDA) methods still suffer from it. This is because the source domain supervision only considers the target domain samples as auxiliary data (e.g., by pseudo-labeling), yet the inherent distribution in the target domain -- where the valuable de-correlation clues hide -- is disregarded. We propose to make the U in UDA matter by giving equal status to the two domains. Specifically, we learn an invariant classifier whose prediction is simultaneously consistent with the labels in the source domain and clusters in the target domain, hence the spurious correlation inconsistent in the target domain is removed. We dub our approach "Invariant CONsistency learning" (ICON). Extensive experiments show that ICON achieves the state-of-the-art performance on the classic UDA benchmarks: Office-Home and VisDA-2017, and outperforms all the conventional methods on the challenging WILDS 2.0 benchmark. Codes are in https://github.com/yue-zhongqi/ICON.Comment: Accepted by NeurIPS 202

arXiv.org e-Print Archive

Visual commonsense representation learning via causal inference

Author: HUANG Jianqiang
SUN Qianru
WANG Tan
ZHANG Hanwang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2020
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Visual Commonsense R-CNN

Author: HUANG Jianqiang
SUN Qianru
WANG Tan
ZHANG Hanwang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/04/2020
Field of study

We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). This is also the core reason why VC R-CNN can learn "sense-making" knowledge like chair can be sat -- while not just "common" co-occurrences such as chair is likely to exist if table is observed. We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them, achieving many new state-of-the-arts. Code and feature are available at https://github.com/Wangt-CN/VC-R-CNN.Comment: Accepted by CVPR 202

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation

Author: Huang Jianqiang
Sun Qianru
Wang Jian
Zhang Hanwang
Publication venue
Publication date: 20/11/2022
Field of study

Extracting class activation maps (CAM) is a key step for weakly-supervised semantic segmentation (WSSS). The CAM of convolution neural networks fails to capture long-range feature dependency on the image and result in the coverage on only foreground object parts, i.e., a lot of false negatives. An intuitive solution is ``coupling'' the CAM with the long-range attention matrix of visual transformers (ViT) We find that the direct ``coupling'', e.g., pixel-wise multiplication of attention and activation, achieves a more global coverage (on the foreground), but unfortunately goes with a great increase of false positives, i.e., background pixels are mistakenly included. This paper aims to tackle this issue. It proposes a new method to couple CAM and Attention matrix in a probabilistic Diffusion way, and dub it AD-CAM. Intuitively, it integrates ViT attention and CAM activation in a conservative and convincing way. Conservative is achieved by refining the attention between a pair of pixels based on their respective attentions to common neighbors, where the intuition is two pixels having very different neighborhoods are rarely dependent, i.e., their attention should be reduced. Convincing is achieved by diffusing a pixel's activation to its neighbors (on the CAM) in proportion to the corresponding attentions (on the AM). In experiments, our results on two challenging WSSS benchmarks PASCAL VOC and MS~COCO show that AD-CAM as pseudo labels can yield stronger WSSS models than the state-of-the-art variants of CAM

arXiv.org e-Print Archive

Causal intervention for weakly-supervised semantic segmentation

Author: HUA Xian-Sheng
SUN Qianru
TANG Jinhui
ZHANG Dong
ZHANG Hanwang
Publication venue
Publication date: 01/12/2020
Field of study

Institutional Knowledge at Singapore Management University

Interventional few-shot learning

Author: HUA Xian-Sheng
SUN Qianru
YUE Zhongqi
ZHANG Hanwang
Publication venue: 'WARC Limited'
Publication date: 01/12/2020
Field of study

Ministry of Education, Singapore under its Academic Research Funding Tier 1 and 2; Alibaba Innovative Research (AIR) programm

Institutional Knowledge at Singapore Management University

Effect of ultrasound on physicochemical properties of emulsion stabilized by fish myofibrillar protein and xanthan gum

Author: Li Qianru
Miao Song
Xiong Yao
Zhang Longtao
Zhang Yi
Zheng Baodong
Publication venue: 'Elsevier BV'
Publication date: 30/04/2019
Field of study

peer-reviewedTo investigate the effects ultrasound (20 kHz, 150–600 W) on physicochemical properties of emulsion stabilized by myofibrillar protein (MP) and xanthan gum (XG), the emulsions were characterized by Fourier transform infrared (FT-IR) spectroscopy, ζ-potential, particle size, rheology, surface tension, and confocal laser scanning microscopy (CLSM). FT-IR spectra confirmed the complexation of MP and XG, and ultrasound did not change the functional groups in the complexes. The emulsion treated at 300 W showed the best stability, with the lowest particle size, the lowest surface tension (26.7 mNm−1) and the largest ζ-potential absolute value (25.4 mV), that were confirmed in the CLSM photos. Ultrasound reduced the apparent viscosity of the MP-XG emulsions, and the changes of particle size were manifested in flow properties. Generally, ultrasound was successfully applied to improve the physical stability of MP-XG emulsion, which could be used as a novel delivery system for functional material

Crossref

T-Stór

Irish Universities