6,088 research outputs found
Painterly Image Harmonization via Adversarial Residual Learning
Image compositing plays a vital role in photo editing. After inserting a
foreground object into another background image, the composite image may look
unnatural and inharmonious. When the foreground is photorealistic and the
background is an artistic painting, painterly image harmonization aims to
transfer the style of background painting to the foreground object, which is a
challenging task due to the large domain gap between foreground and background.
In this work, we employ adversarial learning to bridge the domain gap between
foreground feature map and background feature map. Specifically, we design a
dual-encoder generator, in which the residual encoder produces the residual
features added to the foreground feature map from main encoder. Then, a
pixel-wise discriminator plays against the generator, encouraging the refined
foreground feature map to be indistinguishable from background feature map.
Extensive experiments demonstrate that our method could achieve more harmonious
and visually appealing results than previous methods.Comment: Accepted by WACV202
Inharmonious Region Localization by Magnifying Domain Discrepancy
Inharmonious region localization aims to localize the region in a synthetic
image which is incompatible with surrounding background. The inharmony issue is
mainly attributed to the color and illumination inconsistency produced by image
editing techniques. In this work, we tend to transform the input image to
another color space to magnify the domain discrepancy between inharmonious
region and background, so that the model can identify the inharmonious region
more easily. To this end, we present a novel framework consisting of a color
mapping module and an inharmonious region localization network, in which the
former is equipped with a novel domain discrepancy magnification loss and the
latter could be an arbitrary localization network. Extensive experiments on
image harmonization dataset show the superiority of our designed framework. Our
code is available at
https://github.com/bcmi/MadisNet-Inharmonious-Region-Localization
Learning Global-aware Kernel for Image Harmonization
Image harmonization aims to solve the visual inconsistency problem in
composited images by adaptively adjusting the foreground pixels with the
background as references. Existing methods employ local color transformation or
region matching between foreground and background, which neglects powerful
proximity prior and independently distinguishes fore-/back-ground as a whole
part for harmonization. As a result, they still show a limited performance
across varied foreground objects and scenes. To address this issue, we propose
a novel Global-aware Kernel Network (GKNet) to harmonize local regions with
comprehensive consideration of long-distance background references.
Specifically, GKNet includes two parts, \ie, harmony kernel prediction and
harmony kernel modulation branches. The former includes a Long-distance
Reference Extractor (LRE) to obtain long-distance context and Kernel Prediction
Blocks (KPB) to predict multi-level harmony kernels by fusing global
information with local features. To achieve this goal, a novel Selective
Correlation Fusion (SCF) module is proposed to better select relevant
long-distance background references for local harmonization. The latter employs
the predicted kernels to harmonize foreground regions with both local and
global awareness. Abundant experiments demonstrate the superiority of our
method for image harmonization over state-of-the-art methods, \eg, achieving
39.53dB PSNR that surpasses the best counterpart by +0.78dB ;
decreasing fMSE/MSE by 11.5\%/6.7\% compared with the
SoTA method. Code will be available at
\href{https://github.com/XintianShen/GKNet}{here}.Comment: 10 pages, 10 figure
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
We present a simple yet effective self-supervised pre-training method for
image harmonization which can leverage large-scale unannotated image datasets.
To achieve this goal, we first generate pre-training data online with our
Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image,
LEMaRT generates a foreground mask and then applies a set of transformations to
perturb various visual attributes, e.g., defocus blur, contrast, saturation, of
the region specified by the generated mask. We then pre-train image
harmonization models by recovering the original image from the perturbed image.
Secondly, we introduce an image harmonization model, namely SwinIH, by
retrofitting the Swin Transformer [27] with a combination of local and global
self-attention mechanisms. Pre-training SwinIH with LEMaRT results in a new
state of the art for image harmonization, while being label-efficient, i.e.,
consuming less annotated data for fine-tuning than existing methods. Notably,
on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co
[16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training
data, and by 1.0 dB when it is trained on the full training dataset.Comment: Accepted by CVPR'23, 19 page
Painterly Image Harmonization using Diffusion Model
Painterly image harmonization aims to insert photographic objects into
paintings and obtain artistically coherent composite images. Previous methods
for this task mainly rely on inference optimization or generative adversarial
network, but they are either very time-consuming or struggling at fine control
of the foreground objects (e.g., texture and content details). To address these
issues, we propose a novel Painterly Harmonization stable Diffusion model
(PHDiffusion), which includes a lightweight adaptive encoder and a Dual Encoder
Fusion (DEF) module. Specifically, the adaptive encoder and the DEF module
first stylize foreground features within each encoder. Then, the stylized
foreground features from both encoders are combined to guide the harmonization
process. During training, besides the noise loss in diffusion model, we
additionally employ content loss and two style losses, i.e., AdaIN style loss
and contrastive style loss, aiming to balance the trade-off between style
migration and content preservation. Compared with the state-of-the-art models
from related fields, our PHDiffusion can stylize the foreground more
sufficiently and simultaneously retain finer content. Our code and model are
available at https://github.com/bcmi/PHDiffusion-Painterly-Image-Harmonization.Comment: Accepted by ACMMM 202
- …