Most existing domain adaptation (DA) methods align the features based on the
domain feature distributions and ignore aspects related to fog, background and
target objects, rendering suboptimal performance. In our DA framework, we
retain the depth and background information during the domain feature
alignment. A consistency loss between the generated depth and fog transmission
map is introduced to strengthen the retention of the depth information in the
aligned features. To address false object features potentially generated during
the DA process, we propose an encoder-decoder framework to reconstruct the
fog-free background image. This reconstruction loss also reinforces the
encoder, i.e., our DA backbone, to minimize false object features.Moreover, we
involve our target data in training both our DA module and our detection module
in a semi-supervised manner, so that our detection module is also exposed to
the unlabeled target data, the type of data used in the testing stage. Using
these ideas, our method significantly outperforms the state-of-the-art method
(47.6 mAP against the 44.3 mAP on the Foggy Cityscapes dataset), and obtains
the best performance on multiple real-image public datasets. Code is available
at: https://github.com/VIML-CVDL/Object-Detection-in-Foggy-ScenesComment: Accepted by ACC