StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based
  Domain Generalization

Banerjee, Biplab; Bose, Shirsha; Fini, Enrico; Jha, Ankit; Ricci, Elisa; Singha, Mainak

StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

Authors: Biplab Banerjee
Shirsha Bose
Enrico Fini
Ankit Jha
Elisa Ricci
Mainak Singha
Publication date: 17 June 2023
Publisher

Abstract

Large-scale foundation models (e.g., CLIP) have shown promising zero-shot generalization performance on downstream tasks by leveraging carefully designed language prompts. However, despite their success, most prompt learning techniques tend to underperform in the presence of domain shift. Our study addresses this problem and, to improve CLIP's generalization ability across domains, proposes \textsc{StyLIP}, a novel approach for Domain Generalization (DG) based on a domain-agnostic prompt learning strategy. In the absence of explicit domain knowledge, we aim to disentangle the visual style and the content information extracted from the pre-trained CLIP in the prompts so they can be effortlessly adapted to novel domains during inference. Furthermore, we consider a set of style projectors to learn the prompt tokens directly from these multi-scale style features, and the generated prompt embeddings are later fused with the multi-scale visual features learned through a content projector. The projectors are contrastively trained, given CLIP's frozen vision and text encoders. We present extensive experiments in five different DG settings on multiple benchmarks, demonstrating that \textsc{StyLIP} consistently outperforms the relevant state-of-the-art methods.Comment: 23 pages, 7 figures, 9 table

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.09251

Last time updated on 16/03/2023