The concept of photonic modes is the cornerstone in optics and photonics,
which can describe the propagation of the light. The Maxwell's equations play
the role in calculating the mode field based on the structure information,
while this process needs a great deal of computations, especially in the handle
with a three-dimensional model. To overcome this obstacle, we introduce the
Multi-Modal Diffusion model to predict the photonic modes in one certain
structure. The Contrastive Language-Image Pre-training (CLIP) model is used to
build the connections between photonic structures and the corresponding modes.
Then we exemplify Stable Diffusion (SD) model to realize the function of
optical fields generation from structure information. Our work introduces
Multi-Modal deep learning to construct complex mapping between structural
information and light field as high-dimensional vectors, and generates light
field images based on this mapping