2 research outputs found
ClusT3: Information Invariant Test-Time Training
Deep Learning models have shown remarkable performance in a broad range of
vision tasks. However, they are often vulnerable against domain shifts at
test-time. Test-time training (TTT) methods have been developed in an attempt
to mitigate these vulnerabilities, where a secondary task is solved at training
time simultaneously with the main task, to be later used as an self-supervised
proxy task at test-time. In this work, we propose a novel unsupervised TTT
technique based on the maximization of Mutual Information between multi-scale
feature maps and a discrete latent representation, which can be integrated to
the standard training as an auxiliary clustering task. Experimental results
demonstrate competitive classification performance on different popular
test-time adaptation benchmarks
TFS-ViT: Token-Level Feature Stylization for Domain Generalization
Standard deep learning models such as convolutional neural networks (CNNs)
lack the ability of generalizing to domains which have not been seen during
training. This problem is mainly due to the common but often wrong assumption
of such models that the source and target data come from the same i.i.d.
distribution. Recently, Vision Transformers (ViTs) have shown outstanding
performance for a broad range of computer vision tasks. However, very few
studies have investigated their ability to generalize to new domains. This
paper presents a first Token-level Feature Stylization (TFS-ViT) approach for
domain generalization, which improves the performance of ViTs to unseen data by
synthesizing new domains. Our approach transforms token features by mixing the
normalization statistics of images from different domains. We further improve
this approach with a novel strategy for attention-aware stylization, which uses
the attention maps of class (CLS) tokens to compute and mix normalization
statistics of tokens corresponding to different image regions. The proposed
method is flexible to the choice of backbone model and can be easily applied to
any ViT-based architecture with a negligible increase in computational
complexity. Comprehensive experiments show that our approach is able to achieve
state-of-the-art performance on five challenging benchmarks for domain
generalization, and demonstrate its ability to deal with different types of
domain shifts. The implementation is available at:
https://github.com/Mehrdad-Noori/TFS-ViT_Token-level_Feature_Stylization