284 research outputs found

    Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

    Full text link
    Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e.g., class identity) and domain-specific features (e.g., environment) that does not generalize to the target domain. Unfortunately, even enriched with additional unsupervised target domains, existing Unsupervised DA (UDA) methods still suffer from it. This is because the source domain supervision only considers the target domain samples as auxiliary data (e.g., by pseudo-labeling), yet the inherent distribution in the target domain -- where the valuable de-correlation clues hide -- is disregarded. We propose to make the U in UDA matter by giving equal status to the two domains. Specifically, we learn an invariant classifier whose prediction is simultaneously consistent with the labels in the source domain and clusters in the target domain, hence the spurious correlation inconsistent in the target domain is removed. We dub our approach "Invariant CONsistency learning" (ICON). Extensive experiments show that ICON achieves the state-of-the-art performance on the classic UDA benchmarks: Office-Home and VisDA-2017, and outperforms all the conventional methods on the challenging WILDS 2.0 benchmark. Codes are in https://github.com/yue-zhongqi/ICON.Comment: Accepted by NeurIPS 202

    Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation

    Full text link
    Extracting class activation maps (CAM) is a key step for weakly-supervised semantic segmentation (WSSS). The CAM of convolution neural networks fails to capture long-range feature dependency on the image and result in the coverage on only foreground object parts, i.e., a lot of false negatives. An intuitive solution is ``coupling'' the CAM with the long-range attention matrix of visual transformers (ViT) We find that the direct ``coupling'', e.g., pixel-wise multiplication of attention and activation, achieves a more global coverage (on the foreground), but unfortunately goes with a great increase of false positives, i.e., background pixels are mistakenly included. This paper aims to tackle this issue. It proposes a new method to couple CAM and Attention matrix in a probabilistic Diffusion way, and dub it AD-CAM. Intuitively, it integrates ViT attention and CAM activation in a conservative and convincing way. Conservative is achieved by refining the attention between a pair of pixels based on their respective attentions to common neighbors, where the intuition is two pixels having very different neighborhoods are rarely dependent, i.e., their attention should be reduced. Convincing is achieved by diffusing a pixel's activation to its neighbors (on the CAM) in proportion to the corresponding attentions (on the AM). In experiments, our results on two challenging WSSS benchmarks PASCAL VOC and MS~COCO show that AD-CAM as pseudo labels can yield stronger WSSS models than the state-of-the-art variants of CAM

    Visual Commonsense R-CNN

    Get PDF
    We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). This is also the core reason why VC R-CNN can learn "sense-making" knowledge like chair can be sat -- while not just "common" co-occurrences such as chair is likely to exist if table is observed. We extensively apply VC R-CNN features in prevailing models of three popular tasks: Image Captioning, VQA, and VCR, and observe consistent performance boosts across them, achieving many new state-of-the-arts. Code and feature are available at https://github.com/Wangt-CN/VC-R-CNN.Comment: Accepted by CVPR 202

    Interventional few-shot learning

    Get PDF
    Ministry of Education, Singapore under its Academic Research Funding Tier 1 and 2; Alibaba Innovative Research (AIR) programm

    Effect of ultrasound on physicochemical properties of emulsion stabilized by fish myofibrillar protein and xanthan gum

    Get PDF
    peer-reviewedTo investigate the effects ultrasound (20 kHz, 150–600 W) on physicochemical properties of emulsion stabilized by myofibrillar protein (MP) and xanthan gum (XG), the emulsions were characterized by Fourier transform infrared (FT-IR) spectroscopy, ζ-potential, particle size, rheology, surface tension, and confocal laser scanning microscopy (CLSM). FT-IR spectra confirmed the complexation of MP and XG, and ultrasound did not change the functional groups in the complexes. The emulsion treated at 300 W showed the best stability, with the lowest particle size, the lowest surface tension (26.7 mNm−1) and the largest ζ-potential absolute value (25.4 mV), that were confirmed in the CLSM photos. Ultrasound reduced the apparent viscosity of the MP-XG emulsions, and the changes of particle size were manifested in flow properties. Generally, ultrasound was successfully applied to improve the physical stability of MP-XG emulsion, which could be used as a novel delivery system for functional material
    • …
    corecore