296 research outputs found
Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation
Domain Adaptation (DA) is always challenged by the spurious correlation
between domain-invariant features (e.g., class identity) and domain-specific
features (e.g., environment) that does not generalize to the target domain.
Unfortunately, even enriched with additional unsupervised target domains,
existing Unsupervised DA (UDA) methods still suffer from it. This is because
the source domain supervision only considers the target domain samples as
auxiliary data (e.g., by pseudo-labeling), yet the inherent distribution in the
target domain -- where the valuable de-correlation clues hide -- is
disregarded. We propose to make the U in UDA matter by giving equal status to
the two domains. Specifically, we learn an invariant classifier whose
prediction is simultaneously consistent with the labels in the source domain
and clusters in the target domain, hence the spurious correlation inconsistent
in the target domain is removed. We dub our approach "Invariant CONsistency
learning" (ICON). Extensive experiments show that ICON achieves the
state-of-the-art performance on the classic UDA benchmarks: Office-Home and
VisDA-2017, and outperforms all the conventional methods on the challenging
WILDS 2.0 benchmark. Codes are in https://github.com/yue-zhongqi/ICON.Comment: Accepted by NeurIPS 202
Visual Commonsense R-CNN
We present a novel unsupervised feature representation learning method,
Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to
serve as an improved visual region encoder for high-level tasks such as
captioning and VQA. Given a set of detected object regions in an image (e.g.,
using Faster R-CNN), like any other unsupervised feature learning methods
(e.g., word2vec), the proxy training objective of VC R-CNN is to predict the
contextual objects of a region. However, they are fundamentally different: the
prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while
others are by using the conventional likelihood: P(Y|X). This is also the core
reason why VC R-CNN can learn "sense-making" knowledge like chair can be sat --
while not just "common" co-occurrences such as chair is likely to exist if
table is observed. We extensively apply VC R-CNN features in prevailing models
of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent
performance boosts across them, achieving many new state-of-the-arts. Code and
feature are available at https://github.com/Wangt-CN/VC-R-CNN.Comment: Accepted by CVPR 202
Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation
Extracting class activation maps (CAM) is a key step for weakly-supervised
semantic segmentation (WSSS). The CAM of convolution neural networks fails to
capture long-range feature dependency on the image and result in the coverage
on only foreground object parts, i.e., a lot of false negatives. An intuitive
solution is ``coupling'' the CAM with the long-range attention matrix of visual
transformers (ViT) We find that the direct ``coupling'', e.g., pixel-wise
multiplication of attention and activation, achieves a more global coverage (on
the foreground), but unfortunately goes with a great increase of false
positives, i.e., background pixels are mistakenly included. This paper aims to
tackle this issue. It proposes a new method to couple CAM and Attention matrix
in a probabilistic Diffusion way, and dub it AD-CAM. Intuitively, it integrates
ViT attention and CAM activation in a conservative and convincing way.
Conservative is achieved by refining the attention between a pair of pixels
based on their respective attentions to common neighbors, where the intuition
is two pixels having very different neighborhoods are rarely dependent, i.e.,
their attention should be reduced. Convincing is achieved by diffusing a
pixel's activation to its neighbors (on the CAM) in proportion to the
corresponding attentions (on the AM). In experiments, our results on two
challenging WSSS benchmarks PASCAL VOC and MS~COCO show that AD-CAM as pseudo
labels can yield stronger WSSS models than the state-of-the-art variants of
CAM
Interventional few-shot learning
Ministry of Education, Singapore under its Academic Research Funding Tier 1 and 2; Alibaba Innovative Research (AIR) programm
Effect of ultrasound on physicochemical properties of emulsion stabilized by fish myofibrillar protein and xanthan gum
peer-reviewedTo investigate the effects ultrasound (20 kHz, 150–600 W) on physicochemical properties of emulsion stabilized by myofibrillar protein (MP) and xanthan gum (XG), the emulsions were characterized by Fourier transform infrared (FT-IR) spectroscopy, ζ-potential, particle size, rheology, surface tension, and confocal laser scanning microscopy (CLSM). FT-IR spectra confirmed the complexation of MP and XG, and ultrasound did not change the functional groups in the complexes. The emulsion treated at 300 W showed the best stability, with the lowest particle size, the lowest surface tension (26.7 mNm−1) and the largest ζ-potential absolute value (25.4 mV), that were confirmed in the CLSM photos. Ultrasound reduced the apparent viscosity of the MP-XG emulsions, and the changes of particle size were manifested in flow properties. Generally, ultrasound was successfully applied to improve the physical stability of MP-XG emulsion, which could be used as a novel delivery system for functional material
- …