18 research outputs found
Channel selection for test-time adaptation under distribution shift
To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data
distribution during inference. Test-time batch normalization is a simple and popular
method that achieved compelling performance on domain shift benchmarks by
recalculating batch normalization statistics on test batches. However, in many
practical applications this technique is vulnerable to label distribution shifts. We
propose to tackle this challenge by only selectively adapting channels in a deep
network, minimizing drastic adaptation that is sensitive to label shifts. We find that
adapted models significantly improve the performance compared to the baseline
models and counteract unknown label shifts
Fairness in AI and Its Long-Term Implications on Society
Successful deployment of artificial intelligence (AI) in various settings has
led to numerous positive outcomes for individuals and society. However, AI
systems have also been shown to harm parts of the population due to biased
predictions. We take a closer look at AI fairness and analyse how lack of AI
fairness can lead to deepening of biases over time and act as a social
stressor. If the issues persist, it could have undesirable long-term
implications on society, reinforced by interactions with other risks. We
examine current strategies for improving AI fairness, assess their limitations
in terms of real-world deployment, and explore potential paths forward to
ensure we reap AI's benefits without harming significant parts of the society.Comment: Presented at the 3rd Annual Stanford Existential Risks Conference,
202
Fairness in AI and Its Long-Term Implications on Society
Successful deployment of artificial intelligence (AI) in various settings has led to numerous positive outcomes for individuals and society. However, AI systems have also been shown to harm parts of the population due to biased predictions. We take a closer look at AI fairness and analyse how lack of AI fairness can lead to deepening of biases over time and act as a social stressor. If the issues persist, it could have undesirable long-term implications on society, reinforced by interactions with other risks. We examine current strategies for improving AI fairness, assess their limitations in terms of real-world deployment, and explore potential paths forward to ensure we reap AI's benefits without harming significant parts of the society
On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis
Magnetic Resonance Imaging (MRI) is considered the gold standard of medical
imaging because of the excellent soft-tissue contrast exhibited in the images
reconstructed by the MRI pipeline, which in-turn enables the human radiologist
to discern many pathologies easily. More recently, Deep Learning (DL) models
have also achieved state-of-the-art performance in diagnosing multiple diseases
using these reconstructed images as input. However, the image reconstruction
process within the MRI pipeline, which requires the use of complex hardware and
adjustment of a large number of scanner parameters, is highly susceptible to
noise of various forms, resulting in arbitrary artifacts within the images.
Furthermore, the noise distribution is not stationary and varies within a
machine, across machines, and patients, leading to varying artifacts within the
images. Unfortunately, DL models are quite sensitive to these varying artifacts
as it leads to changes in the input data distribution between the training and
testing phases. The lack of robustness of these models against varying
artifacts impedes their use in medical applications where safety is critical.
In this work, we focus on improving the generalization performance of these
models in the presence of multiple varying artifacts that manifest due to the
complexity of the MR data acquisition. In our experiments, we observe that
Batch Normalization, a widely used technique during the training of DL models
for medical image analysis, is a significant cause of performance degradation
in these changing environments. As a solution, we propose to use other
normalization techniques, such as Group Normalization and Layer Normalization
(LN), to inject robustness into model performance against varying image
artifacts. Through a systematic set of experiments, we show that GN and LN
provide better accuracy for various MR artifacts and distribution shifts.Comment: Accepted at MIDL 202
Test-Time Training for Semantic Segmentation with Output Contrastive Loss
Although deep learning-based segmentation models have achieved impressive
performance on public benchmarks, generalizing well to unseen environments
remains a major challenge. To improve the model's generalization ability to the
new domain during evaluation, the test-time training (TTT) is a challenging
paradigm that adapts the source-pretrained model in an online fashion. Early
efforts on TTT mainly focus on the image classification task. Directly
extending these methods to semantic segmentation easily experiences unstable
adaption due to segmentation's inherent characteristics, such as extreme class
imbalance and complex decision spaces. To stabilize the adaptation process, we
introduce contrastive loss (CL), known for its capability to learn robust and
generalized representations. Nevertheless, the traditional CL operates in the
representation space and cannot directly enhance predictions. In this paper, we
resolve this limitation by adapting the CL to the output space, employing a
high temperature, and simplifying the formulation, resulting in a
straightforward yet effective loss function called Output Contrastive Loss
(OCL). Our comprehensive experiments validate the efficacy of our approach
across diverse evaluation scenarios. Notably, our method excels even when
applied to models initially pre-trained using domain adaptation methods on test
domain data, showcasing its resilience and adaptability.\footnote{Code and more
information could be found at~ \url{https://github.com/dazhangyu123/OCL}
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Prompting and in-context learning (ICL) have become efficient learning
paradigms for large language models (LLMs). However, LLMs suffer from prompt
brittleness and various bias factors in the prompt, including but not limited
to the formatting, the choice verbalizers, and the ICL examples. To address
this problem that results in unexpected performance degradation, calibration
methods have been developed to mitigate the effects of these biases while
recovering LLM performance. In this work, we first conduct a systematic
analysis of the existing calibration methods, where we both provide a unified
view and reveal the failure cases. Inspired by these analyses, we propose Batch
Calibration (BC), a simple yet intuitive method that controls the contextual
bias from the batched input, unifies various prior approaches, and effectively
addresses the aforementioned issues. BC is zero-shot, inference-only, and
incurs negligible additional costs. In the few-shot setup, we further extend BC
to allow it to learn the contextual bias from labeled data. We validate the
effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate
state-of-the-art performance over previous calibration baselines across more
than 10 natural language understanding and image classification tasks.Comment: ICLR 2024. 9 pages, 9 figures, 3 tables (22 pages, 11 figures, 11
tables including references and appendices