3,821 research outputs found
Learning Global-aware Kernel for Image Harmonization
Image harmonization aims to solve the visual inconsistency problem in
composited images by adaptively adjusting the foreground pixels with the
background as references. Existing methods employ local color transformation or
region matching between foreground and background, which neglects powerful
proximity prior and independently distinguishes fore-/back-ground as a whole
part for harmonization. As a result, they still show a limited performance
across varied foreground objects and scenes. To address this issue, we propose
a novel Global-aware Kernel Network (GKNet) to harmonize local regions with
comprehensive consideration of long-distance background references.
Specifically, GKNet includes two parts, \ie, harmony kernel prediction and
harmony kernel modulation branches. The former includes a Long-distance
Reference Extractor (LRE) to obtain long-distance context and Kernel Prediction
Blocks (KPB) to predict multi-level harmony kernels by fusing global
information with local features. To achieve this goal, a novel Selective
Correlation Fusion (SCF) module is proposed to better select relevant
long-distance background references for local harmonization. The latter employs
the predicted kernels to harmonize foreground regions with both local and
global awareness. Abundant experiments demonstrate the superiority of our
method for image harmonization over state-of-the-art methods, \eg, achieving
39.53dB PSNR that surpasses the best counterpart by +0.78dB ;
decreasing fMSE/MSE by 11.5\%/6.7\% compared with the
SoTA method. Code will be available at
\href{https://github.com/XintianShen/GKNet}{here}.Comment: 10 pages, 10 figure
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning
Watermarking serves as a widely adopted approach to safeguard media
copyright. In parallel, the research focus has extended to watermark removal
techniques, offering an adversarial means to enhance watermark robustness and
foster advancements in the watermarking field. Existing watermark removal
methods mainly rely on UNet with task-specific decoder branches--one for
watermark localization and the other for background image restoration. However,
watermark localization and background restoration are not isolated tasks;
precise watermark localization inherently implies regions necessitating
restoration, and the background restoration process contributes to more
accurate watermark localization. To holistically integrate information from
both branches, we introduce an implicit joint learning paradigm. This empowers
the network to autonomously navigate the flow of information between implicit
branches through a gate mechanism. Furthermore, we employ cross-channel
attention to facilitate local detail restoration and holistic structural
comprehension, while harnessing nested structures to integrate multi-scale
information. Extensive experiments are conducted on various challenging
benchmarks to validate the effectiveness of our proposed method. The results
demonstrate our approach's remarkable superiority, surpassing existing
state-of-the-art methods by a large margin
Painterly Image Harmonization in Dual Domains
Image harmonization aims to produce visually harmonious composite images by
adjusting the foreground appearance to be compatible with the background. When
the composite image has photographic foreground and painterly background, the
task is called painterly image harmonization. There are only few works on this
task, which are either time-consuming or weak in generating well-harmonized
results. In this work, we propose a novel painterly harmonization network
consisting of a dual-domain generator and a dual-domain discriminator, which
harmonizes the composite image in both spatial domain and frequency domain. The
dual-domain generator performs harmonization by using AdaIN modules in the
spatial domain and our proposed ResFFT modules in the frequency domain. The
dual-domain discriminator attempts to distinguish the inharmonious patches
based on the spatial feature and frequency feature of each patch, which can
enhance the ability of generator in an adversarial manner. Extensive
experiments on the benchmark dataset show the effectiveness of our method. Our
code and model are available at
https://github.com/bcmi/PHDNet-Painterly-Image-Harmonization.Comment: Accepted by AAAI202
Harmonizer: Learning to Perform White-Box Image and Video Harmonization
Recent works on image harmonization solve the problem as a pixel-wise image
translation task via large autoencoders. They have unsatisfactory performances
and slow inference speeds when dealing with high-resolution images. In this
work, we observe that adjusting the input arguments of basic image filters,
e.g., brightness and contrast, is sufficient for humans to produce realistic
images from the composite ones. Hence, we frame image harmonization as an
image-level regression problem to learn the arguments of the filters that
humans use for the task. We present a Harmonizer framework for image
harmonization. Unlike prior methods that are based on black-box autoencoders,
Harmonizer contains a neural network for filter argument prediction and several
white-box filters (based on the predicted arguments) for image harmonization.
We also introduce a cascade regressor and a dynamic loss strategy for
Harmonizer to learn filter arguments more stably and precisely. Since our
network only outputs image-level arguments and the filters we used are
efficient, Harmonizer is much lighter and faster than existing methods.
Comprehensive experiments demonstrate that Harmonizer surpasses existing
methods notably, especially with high-resolution inputs. Finally, we apply
Harmonizer to video harmonization, which achieves consistent results across
frames and 56 fps at 1080P resolution. Code and models are available at:
https://github.com/ZHKKKe/Harmonizer
Painterly Image Harmonization using Diffusion Model
Painterly image harmonization aims to insert photographic objects into
paintings and obtain artistically coherent composite images. Previous methods
for this task mainly rely on inference optimization or generative adversarial
network, but they are either very time-consuming or struggling at fine control
of the foreground objects (e.g., texture and content details). To address these
issues, we propose a novel Painterly Harmonization stable Diffusion model
(PHDiffusion), which includes a lightweight adaptive encoder and a Dual Encoder
Fusion (DEF) module. Specifically, the adaptive encoder and the DEF module
first stylize foreground features within each encoder. Then, the stylized
foreground features from both encoders are combined to guide the harmonization
process. During training, besides the noise loss in diffusion model, we
additionally employ content loss and two style losses, i.e., AdaIN style loss
and contrastive style loss, aiming to balance the trade-off between style
migration and content preservation. Compared with the state-of-the-art models
from related fields, our PHDiffusion can stylize the foreground more
sufficiently and simultaneously retain finer content. Our code and model are
available at https://github.com/bcmi/PHDiffusion-Painterly-Image-Harmonization.Comment: Accepted by ACMMM 202
WaveNets: Wavelet Channel Attention Networks
Channel Attention reigns supreme as an effective technique in the field of
computer vision. However, the proposed channel attention by SENet suffers from
information loss in feature learning caused by the use of Global Average
Pooling (GAP) to represent channels as scalars. Thus, designing effective
channel attention mechanisms requires finding a solution to enhance features
preservation in modeling channel inter-dependencies. In this work, we utilize
Wavelet transform compression as a solution to the channel representation
problem. We first test wavelet transform as an Auto-Encoder model equipped with
conventional channel attention module. Next, we test wavelet transform as a
standalone channel compression method. We prove that global average pooling is
equivalent to the recursive approximate Haar wavelet transform. With this
proof, we generalize channel attention using Wavelet compression and name it
WaveNet. Implementation of our method can be embedded within existing channel
attention methods with a couple of lines of code. We test our proposed method
using ImageNet dataset for image classification task. Our method outperforms
the baseline SENet, and achieves the state-of-the-art results. Our code
implementation is publicly available at https://github.com/hady1011/WaveNet-C.Comment: IEEE BigData2022 conferenc
- …