Denosing diffusion model, as a generative model, has received a lot of
attention in the field of image generation recently, thanks to its powerful
generation capability. However, diffusion models have not yet received
sufficient research in the field of image fusion. In this article, we introduce
diffusion model to the image fusion field, treating the image fusion task as
image-to-image translation and designing two different conditional injection
modulation modules (i.e., style transfer modulation and wavelet modulation) to
inject coarse-grained style information and fine-grained high-frequency and
low-frequency information into the diffusion UNet, thereby generating fused
images. In addition, we also discussed the residual learning and the selection
of training objectives of the diffusion model in the image fusion task.
Extensive experimental results based on quantitative and qualitative
assessments compared with benchmarks demonstrates state-of-the-art results and
good generalization performance in image fusion tasks. Finally, it is hoped
that our method can inspire other works and gain insight into this field to
better apply the diffusion model to image fusion tasks. Code shall be released
for better reproducibility