70 research outputs found
Using Language to Extend to Unseen Domains
It is expensive to collect training data for every possible domain that a
vision model may encounter when deployed. We instead consider how simply
verbalizing the training domain (e.g. "photos of birds") as well as domains we
want to extend to but do not have data for (e.g. "paintings of birds") can
improve robustness. Using a multimodal model with a joint image and language
embedding space, our method LADS learns a transformation of the image
embeddings from the training domain to each unseen test domain, while
preserving task relevant information. Without using any images from the unseen
test domain, we show that over the extended domain containing both training and
unseen test domains, LADS outperforms standard fine-tuning and ensemble
approaches over a suite of four benchmarks targeting domain adaptation and
dataset bias
Contrastive Adapters for Foundation Model Group Robustness
While large pretrained foundation models (FMs) have shown remarkable
zero-shot classification robustness to dataset-level distribution shifts, their
robustness to subpopulation or group shifts is relatively underexplored. We
study this problem, and find that FMs such as CLIP may not be robust to various
group shifts. Across 9 robustness benchmarks, zero-shot classification with
their embeddings results in gaps of up to 80.7 percentage points (pp) between
average and worst-group accuracy. Unfortunately, existing methods to improve
robustness require retraining, which can be prohibitively expensive on large
foundation models. We also find that efficient ways to improve model inference
(e.g., via adapters, lightweight networks with FM embeddings as inputs) do not
consistently improve and can sometimes hurt group robustness compared to
zero-shot (e.g., increasing the accuracy gap by 50.1 pp on CelebA). We thus
develop an adapter training strategy to effectively and efficiently improve FM
group robustness. Our motivating observation is that while poor robustness
results from groups in the same class being embedded far apart in the
foundation model "embedding space," standard adapter training may not bring
these points closer together. We thus propose contrastive adapting, which
trains adapters with contrastive learning to bring sample embeddings close to
both their ground-truth class embeddings and other sample embeddings in the
same class. Across the 9 benchmarks, our approach consistently improves group
robustness, raising worst-group accuracy by 8.5 to 56.0 pp over zero-shot. Our
approach is also efficient, doing so without any FM finetuning and only a fixed
set of frozen FM embeddings. On benchmarks such as Waterbirds and CelebA, this
leads to worst-group accuracy comparable to state-of-the-art methods that
retrain entire models, while only training 1% of the model parameters.Comment: 28 pages, 9 figures. Preprint. Short version accepted to ICML 2022
Workshop on Spurious Correlations, Invariance, and Stabilit
In Search for a Generalizable Method for Source Free Domain Adaptation
Source-free domain adaptation (SFDA) is compelling because it allows adapting
an off-the-shelf model to a new domain using only unlabelled data. In this
work, we apply existing SFDA techniques to a challenging set of
naturally-occurring distribution shifts in bioacoustics, which are very
different from the ones commonly studied in computer vision. We find existing
methods perform differently relative to each other than observed in vision
benchmarks, and sometimes perform worse than no adaptation at all. We propose a
new simple method which outperforms the existing methods on our new shifts
while exhibiting strong performance on a range of vision datasets. Our findings
suggest that existing SFDA methods are not as generalizable as previously
thought and that considering diverse modalities can be a useful avenue for
designing more robust models
From Global to Local: Multi-scale Out-of-distribution Detection
Out-of-distribution (OOD) detection aims to detect "unknown" data whose
labels have not been seen during the in-distribution (ID) training process.
Recent progress in representation learning gives rise to distance-based OOD
detection that recognizes inputs as ID/OOD according to their relative
distances to the training data of ID classes. Previous approaches calculate
pairwise distances relying only on global image representations, which can be
sub-optimal as the inevitable background clutter and intra-class variation may
drive image-level representations from the same ID class far apart in a given
representation space. In this work, we overcome this challenge by proposing
Multi-scale OOD DEtection (MODE), a first framework leveraging both global
visual information and local region details of images to maximally benefit OOD
detection. Specifically, we first find that existing models pretrained by
off-the-shelf cross-entropy or contrastive losses are incompetent to capture
valuable local representations for MODE, due to the scale-discrepancy between
the ID training and OOD detection processes. To mitigate this issue and
encourage locally discriminative representations in ID training, we propose
Attention-based Local PropAgation (ALPA), a trainable objective that exploits a
cross-attention mechanism to align and highlight the local regions of the
target objects for pairwise examples. During test-time OOD detection, a
Cross-Scale Decision (CSD) function is further devised on the most
discriminative multi-scale representations to distinguish ID/OOD data more
faithfully. We demonstrate the effectiveness and flexibility of MODE on several
benchmarks -- on average, MODE outperforms the previous state-of-the-art by up
to 19.24% in FPR, 2.77% in AUROC. Code is available at
https://github.com/JimZAI/MODE-OOD.Comment: 13 page
Domain Generalization for Medical Image Analysis: A Survey
Medical Image Analysis (MedIA) has become an essential tool in medicine and
healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and
recent successes in deep learning (DL) have made significant contributions to
its advances. However, DL models for MedIA remain challenging to deploy in
real-world situations, failing for generalization under the distributional gap
between training and testing samples, known as a distribution shift problem.
Researchers have dedicated their efforts to developing various DL methods to
adapt and perform robustly on unknown and out-of-distribution data
distributions. This paper comprehensively reviews domain generalization studies
specifically tailored for MedIA. We provide a holistic view of how domain
generalization techniques interact within the broader MedIA system, going
beyond methodologies to consider the operational implications on the entire
MedIA workflow. Specifically, we categorize domain generalization methods into
data-level, feature-level, model-level, and analysis-level methods. We show how
those methods can be used in various stages of the MedIA workflow with DL
equipped from data acquisition to model prediction and analysis. Furthermore,
we include benchmark datasets and applications used to evaluate these
approaches and analyze the strengths and weaknesses of various methods,
unveiling future research opportunities
Dream the Impossible: Outlier Imagination with Diffusion Models
Utilizing auxiliary outlier datasets to regularize the machine learning model
has demonstrated promise for out-of-distribution (OOD) detection and safe
prediction. Due to the labor intensity in data collection and cleaning,
automating outlier data generation has been a long-desired alternative. Despite
the appeal, generating photo-realistic outliers in the high dimensional pixel
space has been an open challenge for the field. To tackle the problem, this
paper proposes a new framework DREAM-OOD, which enables imagining
photo-realistic outliers by way of diffusion models, provided with only the
in-distribution (ID) data and classes. Specifically, DREAM-OOD learns a
text-conditioned latent space based on ID data, and then samples outliers in
the low-likelihood region via the latent, which can be decoded into images by
the diffusion model. Different from prior works, DREAM-OOD enables visualizing
and understanding the imagined outliers, directly in the pixel space. We
conduct comprehensive quantitative and qualitative studies to understand the
efficacy of DREAM-OOD, and show that training with the samples generated by
DREAM-OOD can benefit OOD detection performance. Code is publicly available at
https://github.com/deeplearning-wisc/dream-ood.Comment: NeurIPS 202
Topological Structure Learning for Weakly-Supervised Out-of-Distribution Detection
Out-of-distribution (OOD) detection is the key to deploying models safely in
the open world. For OOD detection, collecting sufficient in-distribution (ID)
labeled data is usually more time-consuming and costly than unlabeled data.
When ID labeled data is limited, the previous OOD detection methods are no
longer superior due to their high dependence on the amount of ID labeled data.
Based on limited ID labeled data and sufficient unlabeled data, we define a new
setting called Weakly-Supervised Out-of-Distribution Detection (WSOOD). To
solve the new problem, we propose an effective method called Topological
Structure Learning (TSL). Firstly, TSL uses a contrastive learning method to
build the initial topological structure space for ID and OOD data. Secondly,
TSL mines effective topological connections in the initial topological space.
Finally, based on limited ID labeled data and mined topological connections,
TSL reconstructs the topological structure in a new topological space to
increase the separability of ID and OOD instances. Extensive studies on several
representative datasets show that TSL remarkably outperforms the
state-of-the-art, verifying the validity and robustness of our method in the
new setting of WSOOD
PV2TEA: Patching Visual Modality to Textual-Established Information Extraction
Information extraction, e.g., attribute value extraction, has been
extensively studied and formulated based only on text. However, many attributes
can benefit from image-based extraction, like color, shape, pattern, among
others. The visual modality has long been underutilized, mainly due to
multimodal annotation difficulty. In this paper, we aim to patch the visual
modality to the textual-established attribute information extractor. The
cross-modality integration faces several unique challenges: (C1) images and
textual descriptions are loosely paired intra-sample and inter-samples; (C2)
images usually contain rich backgrounds that can mislead the prediction; (C3)
weakly supervised labels from textual-established extractors are biased for
multimodal training. We present PV2TEA, an encoder-decoder architecture
equipped with three bias reduction schemes: (S1) Augmented label-smoothed
contrast to improve the cross-modality alignment for loosely-paired image and
text; (S2) Attention-pruning that adaptively distinguishes the visual
foreground; (S3) Two-level neighborhood regularization that mitigates the label
textual bias via reliability estimation. Empirical results on real-world
e-Commerce datasets demonstrate up to 11.74% absolute (20.97% relatively) F1
increase over unimodal baselines.Comment: ACL 2023 Finding
Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need
The core of out-of-distribution (OOD) detection is to learn the
in-distribution (ID) representation, which is distinguishable from OOD samples.
Previous work applied recognition-based methods to learn the ID features, which
tend to learn shortcuts instead of comprehensive representations. In this work,
we find surprisingly that simply using reconstruction-based methods could boost
the performance of OOD detection significantly. We deeply explore the main
contributors of OOD detection and find that reconstruction-based pretext tasks
have the potential to provide a generally applicable and efficacious prior,
which benefits the model in learning intrinsic data distributions of the ID
dataset. Specifically, we take Masked Image Modeling as a pretext task for our
OOD detection framework (MOOD). Without bells and whistles, MOOD outperforms
previous SOTA of one-class OOD detection by 5.7%, multi-class OOD detection by
3.0%, and near-distribution OOD detection by 2.1%. It even defeats the
10-shot-per-class outlier exposure OOD detection, although we do not include
any OOD samples for our detectionComment: This paper is accepted by CVPR2023 and our codes are released here:
https://github.com/JulietLJY/MOO
- …