12 research outputs found
Disentangled Contrastive Image Translation for Nighttime Surveillance
Nighttime surveillance suffers from degradation due to poor illumination and
arduous human annotations. It is challengable and remains a security risk at
night. Existing methods rely on multi-spectral images to perceive objects in
the dark, which are troubled by low resolution and color absence. We argue that
the ultimate solution for nighttime surveillance is night-to-day translation,
or Night2Day, which aims to translate a surveillance scene from nighttime to
the daytime while maintaining semantic consistency. To achieve this, this paper
presents a Disentangled Contrastive (DiCo) learning method. Specifically, to
address the poor and complex illumination in the nighttime scenes, we propose a
learnable physical prior, i.e., the color invariant, which provides a stable
perception of a highly dynamic night environment and can be incorporated into
the learning pipeline of neural networks. Targeting the surveillance scenes, we
develop a disentangled representation, which is an auxiliary pretext task that
separates surveillance scenes into the foreground and background with
contrastive learning. Such a strategy can extract the semantics without
supervision and boost our model to achieve instance-aware translation. Finally,
we incorporate all the modules above into generative adversarial networks and
achieve high-fidelity translation. This paper also contributes a new
surveillance dataset called NightSuR. It includes six scenes to support the
study on nighttime surveillance. This dataset collects nighttime images with
different properties of nighttime environments, such as flare and extreme
darkness. Extensive experiments demonstrate that our method outperforms
existing works significantly. The dataset and source code will be released on
GitHub soon.Comment: Submitted to TI
MOGAN: Morphologic-structure-aware Generative Learning from a Single Image
In most interactive image generation tasks, given regions of interest (ROI)
by users, the generated results are expected to have adequate diversities in
appearance while maintaining correct and reasonable structures in original
images. Such tasks become more challenging if only limited data is available.
Recently proposed generative models complete training based on only one image.
They pay much attention to the monolithic feature of the sample while ignoring
the actual semantic information of different objects inside the sample. As a
result, for ROI-based generation tasks, they may produce inappropriate samples
with excessive randomicity and without maintaining the related objects' correct
structures. To address this issue, this work introduces a
MOrphologic-structure-aware Generative Adversarial Network named MOGAN that
produces random samples with diverse appearances and reliable structures based
on only one image. For training for ROI, we propose to utilize the data coming
from the original image being augmented and bring in a novel module to
transform such augmented data into knowledge containing both structures and
appearances, thus enhancing the model's comprehension of the sample. To learn
the rest areas other than ROI, we employ binary masks to ensure the generation
isolated from ROI. Finally, we set parallel and hierarchical branches of the
mentioned learning process. Compared with other single image GAN schemes, our
approach focuses on internal features including the maintenance of rational
structures and variation on appearance. Experiments confirm a better capacity
of our model on ROI-based image generation tasks than its competitive peers
Object-based attention mechanism for color calibration of UAV remote sensing images in precision agriculture.
Color calibration is a critical step for unmanned aerial vehicle (UAV) remote sensing, especially in precision agriculture, which relies mainly on correlating color changes to specific quality attributes, e.g. plant health, disease, and pest stresses. In UAV remote sensing, the exemplar-based color transfer is popularly used for color calibration, where the automatic search for the semantic correspondences is the key to ensuring the color transfer accuracy. However, the existing attention mechanisms encounter difficulties in building the precise semantic correspondences between the reference image and the target one, in which the normalized cross correlation is often computed for feature reassembling. As a result, the color transfer accuracy is inevitably decreased by the disturbance from the semantically unrelated pixels, leading to semantic mismatch due to the absence of semantic correspondences. In this article, we proposed an unsupervised object-based attention mechanism (OBAM) to suppress the disturbance of the semantically unrelated pixels, along with a further introduced weight-adjusted Adaptive Instance Normalization (AdaIN) (WAA) method to tackle the challenges caused by the absence of semantic correspondences. By embedding the proposed modules into a photorealistic style transfer method with progressive stylization, the color transfer accuracy can be improved while better preserving the structural details. We evaluated our approach on the UAV data of different crop types including rice, beans, and cotton. Extensive experiments demonstrate that our proposed method outperforms several state-of-the-art methods. As our approach requires no annotated labels, it can be easily embedded into the off-the-shelf color transfer approaches. Relevant codes and configurations will be available at https://github.com/huanghsheng/object-based-attention-mechanis
Retrieval Guided Unsupervised Multi-domain Image-to-Image Translation
Image to image translation aims to learn a mapping that transforms an image
from one visual domain to another. Recent works assume that images descriptors
can be disentangled into a domain-invariant content representation and a
domain-specific style representation. Thus, translation models seek to preserve
the content of source images while changing the style to a target visual
domain. However, synthesizing new images is extremely challenging especially in
multi-domain translations, as the network has to compose content and style to
generate reliable and diverse images in multiple domains. In this paper we
propose the use of an image retrieval system to assist the image-to-image
translation task. First, we train an image-to-image translation model to map
images to multiple domains. Then, we train an image retrieval model using real
and generated images to find images similar to a query one in content but in a
different domain. Finally, we exploit the image retrieval system to fine-tune
the image-to-image translation model and generate higher quality images. Our
experiments show the effectiveness of the proposed solution and highlight the
contribution of the retrieval network, which can benefit from additional
unlabeled data and help image-to-image translation models in the presence of
scarce data.Comment: Submitted to ACM MM '20, October 12-16, 2020, Seattle, WA, US