Ensuring the realism of computer-generated synthetic images is crucial to
deep neural network (DNN) training. Due to different semantic distributions
between synthetic and real-world captured datasets, there exists semantic
mismatch between synthetic and refined images, which in turn results in the
semantic distortion. Recently, contrastive learning (CL) has been successfully
used to pull correlated patches together and push uncorrelated ones apart. In
this work, we exploit semantic and structural consistency between synthetic and
refined images and adopt CL to reduce the semantic distortion. Besides, we
incorporate hard negative mining to improve the performance furthermore. We
compare the performance of our method with several other benchmarking methods
using qualitative and quantitative measures and show that our method offers the
state-of-the-art performance