This work presents a new method for unsupervised thermal image classification
and semantic segmentation by transferring knowledge from the RGB domain using a
multi-domain attention network. Our method does not require any thermal
annotations or co-registered RGB-thermal pairs, enabling robots to perform
visual tasks at night and in adverse weather conditions without incurring
additional costs of data labeling and registration. Current unsupervised domain
adaptation methods look to align global images or features across domains.
However, when the domain shift is significantly larger for cross-modal data,
not all features can be transferred. We solve this problem by using a shared
backbone network that promotes generalization, and domain-specific attention
that reduces negative transfer by attending to domain-invariant and
easily-transferable features. Our approach outperforms the state-of-the-art
RGB-to-thermal adaptation method in classification benchmarks, and is
successfully applied to thermal river scene segmentation using only synthetic
RGB images. Our code is made publicly available at
https://github.com/ganlumomo/thermal-uda-attention