To address the problems of temporal gradient instability and inadequate image detail preservation in existing diffusion models for infrared and visible image fusion, a dynamic temporal diffused and multi-level refined network for infrared and visible image fusion was proposed. Firstly, in the diffusion fusion module, a graph neural network middle block was employed to precisely capture local features and model complex relationships between feature regions. Then, a dynamic denoising path planning method was developed to achieve optimal partitioning of the reverse denoising process. Subsequently, a segmented decoder was designed based on the optimal dynamic denoising path to perform targeted optimization denoising of the diffusion process. Finally, a multi-level refinement module with cross-hierarchical feature was constructed to perform further refined fusion through re-extraction of source image features and preliminary denoised fused images. An analysis and comparison with 9 representative fusion methods, both subjectively and objectively, on the MSRS, M3FD, and RoadScene datasets show that the proposed method achieves improvements in 7 objective evaluation metrics. The proposed method has excellent performance in complex scenes, enhances the preservation of image details, and can better conform to human visual characteristics. Furthermore, practical applicability is validated through downstream object detection experiments